Camera Calibration for Coastal Monitoring Using Available Snapshot Images

: Joint intrinsic and extrinsic calibration from a single snapshot is a common requirement in coastal monitoring practice. This work analyzes the inﬂuence of different aspects, such as the distribution of Ground Control Points (GCPs) or the image obliquity, on the quality of the calibration for two different mathematical models (one being a simpliﬁcation of the other). The performance of the two models is assessed using extensive laboratory data (i.e., snapshots of a grid). While both models are able to properly adjust the GCPs, the simpler model gives a better overall performance when the GCPs are not well distributed over the image. Furthermore, the simpler model allows for better recovery of the camera position and orientation. P.S., and J.G.; visualization, G.S. and D.C.; supervision, G.S., D.C., P.S., and J.G.; project administration, G.S. and D.C.; funding acquisition, G.S., D.C., and J.G. All authors have read and agreed to the published version of the manuscript.


Introduction
Coastal monitoring systems using digital video cameras have become a widely used tool to study near-shore processes since the advent of the ARGUS system over 20 years ago [1,2]. At present, besides the original ARGUS developments, there exists a wide variety of packages to manage image acquisition and processing ( [3][4][5][6], among others). Video monitoring systems have been shown to be useful, to cite just a few examples, in obtaining intertidal and subaquatic bathymetries [7][8][9], to detect and analyze shoreline dynamics [10,11], or to study the morphodynamics of beach systems [12,13]. Camera calibration is critical in coastal video monitoring systems, as it allows us to relate pixels in the images to real-world co-ordinates and vice versa.
Camera calibration in coastal video monitoring follows close-range photogrammetric procedures [1,14]. Even though the distance to the objects monitored (i.e., beaches) are up to ∼1000 m, the hypotheses of close-range calibration apply (e.g., no atmospheric refraction or non-negligible lens distortion). Actually, in ARGUS-like fixed stations, it is common practice to obtain the parameters related to lens distortion (intrinsic calibration parameters) prior to final deployment through classic close-range methods, using chessboard or similar patterns [1,6]. The camera position and orientation (extrinsic calibration parameters) are then obtained through Ground Control Points (GCPs); that is, pixels whose real-world co-ordinates are known. The literature on full (intrinsic and extrinsic parameters) close-range camera calibration photogrammetry is extensive, and includes studies on the governing equations [15][16][17], the calibration procedures [17][18][19][20][21], and applications including structure-from-motion and multi-camera approaches [22][23][24]. However, there have been few works dealing with the full calibration from a single image using a few GCPs.
In most coastal ARGUS-like monitoring systems, the intrinsic parameters are obtained prior to the final deployment of the camera, as mentioned above, and the extrinsic parameters are then obtained through GCPs. In many practical situations, however, intrinsic calibration of the camera is not available. This is the case, for example, when using available surfcams around the globe to obtain morphodynamic information [25] or in the CoastSnap project [14]-a citizen science project in which citizens provide smartphone images for some given beaches. In general, taking advantage of the huge amount of freely available coastal images for morphodynamic studies and coastal management is a challenge for the research community. In such situations, all of calibration parameters (i.e., both intrinsic and extrinsic) must be obtained from the GCPs [25]. In a calibration campaign, a large amount of targets (GCPs) can be spread over the entire image and high quality calibrations can be obtained. In the practical situation we want to address, it is only possible to use fixed features and, as large portions of the images are sand, water, or sky, the GCPs are restricted to a relatively small part of the image. For illustrative purposes, Figure 1 includes two snapshots from Castelldefels and Barcelona beaches (Spain, see coo.icm.csic.es): in Figure 1A, the GCPs are usually in the lower half of the image; while in Figure 1B, they mainly lie in the right and lower parts. In addition, the number of points used is usually small; for example, [14] used only seven GCPs for georectification of community-contributed images. Such a low number of GCPs also raises the question of which is-while remaining in the domain of close-range photogrammetry-the most suitable calibration model. Please note that this is very different to what is usually found in close-range photogrammetry, where the calibration is done using a large number of points. In summary, new demands on coastal monitoring systems require further understanding of image calibration when a reduced number of GCPs must be chosen within only a small region of the image. The main objective of this contribution is to determine the most suitable GCP distributions and calibration model to georectify images on coastal monitoring systems. To do this, we assume that there is only a single snapshot available to obtain a full camera calibration (intrinsic and extrinsic parameters) with a reduced number of GCPs. In addition, the premises of close-range photogrammetry and the non-use of wide angle lenses are considered. Two mathematical models are considered, one being a simplification of the other. The influence of the obliquity of the snapshot or the GCP distribution throughout the image on the calibration quality is analyzed. The ability to accurately recover some useful calibration parameters (e.g., camera position) is also discussed.

Camera Mathematical Models
The pinhole model [26], together with the Brown-Conrady [27] model for decentered lens distortion, are the governing equations typically used for cameras in coastal video monitoring systems; see Figure 2. Given the real-world co-ordinates of a point, x = (x, y, z) , its pixel position, in terms of column c and row r, is given by: where k 1 , k 2 , p 1 , p 2 , s c , s r , o c , and o r are free parameters; higher order distortion terms are avoided for we do not consider wide angle lenses. Furthermore, d 2 U = x 2 U + y 2 U and x U and y U are given by where x c = (x c , y c , z c ) is the optical center (camera position); e u , e v , and e f are orthogonal unit vectors given by the Eulerian angles of the camera (azimuth φ, roll σ, and tilt τ); and k c and k r stand for the pixel co-ordinates of the center of the image (known). The inversion of the above Equations (1) and (2) allows us to obtain the real-world co-ordinates of a pixel if an extra condition is given (typically, the point being in a horizontal plane z = z 0 ); this inversion requires the use of iterative procedures to solve the implicit equations. Overall, 14 parameters need to be established to calibrate the above (mathematical) camera model. The intrinsic parameters are as follows: • radial and tangential distortions: k 1 , k 2 , p 1 , and p 2 (dimensionless); • pixel size: s c and s r (dimensionless); and • decentering: o c and o r (in pixels), and the extrinsic parameters are: • real world co-ordinates of the center of vision: x c , y c , and z c (in units of length); and • Eulerian angles: φ, σ, and τ (in radians).
The above equations (including the set of 14 parameters) are referred to as the "complete" model, or M 1 . For most present-day cameras, it is reasonable to assume that the radial distortion is parabolic (i.e., k 2 = 0), the tangential distortion is negligible (p 1 = p 2 = 0), the pixels are squared (s c = s r ), and that the decentering is also negligible (o c = k c and o r = k r ). The above hypotheses lead to a "reduced" model, herein called M 2 , with only 8 free parameters (x c , y c , z c , φ, σ, τ, k 1 , and s c ). Interestingly, the inversion of the model equations becomes explicit when model M 2 is considered (i.e., it becomes a cubic equation).

Error Definition and Calibration Procedure
A Ground Control Point (GCP) is a 5-tuple including the real-world co-ordinates of a point and the corresponding pixel co-ordinates (column c and row r) in an image (i.e., (x, y, z, c, r) ). For a set of n GCPs (x i , y i , z i , c i , r i ) and a camera model with given intrinsic and extrinsic parameters, following [6], the calibration error is defined as where c * i and r * i are the pixel co-ordinates obtained from the corresponding real-world co-ordinates (i.e., (x i , y i , z i ) ) through the camera model for the given parameters. For a certain set of GCPs, an image is here calibrated by finding the parameters (intrinsic and extrinsic) which minimize the above error. The optimization method considered is Broyden-Fletcher-Goldfarb-Shanno (BFGS, [28]) combined with a Monte-Carlo-like seeding method. Usually, the calibration takes only a few CPU seconds.
In real practice, the pixel co-ordinates of GCPs are manually digitized by an expert user, with an unavoidable error that is usually on the order of a few pixels. Understanding the influence of different factors (e.g., the obliquity, the amount and distribution of the GCPs, or the mathematical model) on the propagation of this error to the calibration quality is a key issue. For this reason, J "perturbed" calibrations are performed for each of the analyzed cases in the following section. For each j of these J calibrations, each of the n pixel co-ordinates of the GCPs, originally digitized at (c i , r i ) , was randomly perturbed with a noise of ±2 pixels (px); that is, (c i + ξ ij , r i + ψ ij ) , where ξ ij and ψ ij are realizations of a uniformly distributed random variable in the range [−2, +2] . The calibration errors for each of these perturbations are referred to as P (j) ; that is, where (c ij ,r ij ) is the pixel co-ordinate obtained from (x i , y i , z i ) using the camera mathematical model and the j th perturbed calibration parameters. The above error is defined for each pixel of the set of GCPs. The Root Mean Square (RMS) over the set of GCPs is with i running over all the pixels of the GCPs. The error Q gives a measure of the quality of the calibration for a given set of GCPs. If the GCPs are the same set used to obtain the perturbed calibrations, the error will be referred to as G . When there are no pixel perturbations, ξ ij = ψ ij = 0 for all j (and i) and Equation (4) reduces to Equation (3) (i.e., P (j) = * for all j). Furthermore, in the unperturbed case, as the perturbed calibrations become the original unperturbed ones,c ij = c * i (for all j), such that, from Equation (5), it is˜ 2 (i) = |c i − c * i | 2 + |r i − r * i | 2 and, from Equation (6), G = * (= P ) .

Experimental Setup
To gain a better understanding of the influence of different aspects on the quality of the calibration and the accuracy of the calibration parameters, a wide range of scenarios was analyzed. Two smartphone cameras were employed: a Samsung Galaxy Grand Prime (2048 × 1152 pixels) and a Xiaomi Redmi 10 (2016 × 1512 pixels). As both cameras gave equivalent results, only one of them (the Samsung) is introduced below, for the sake of clarity. Three different snapshots were taken of the same grid (see Figure 3), in order to consider a range of obliquities (tilt τ): τ ∼ 55 • (angle A 1 ), τ ∼ 40 • (A 2 ), and τ ∼ 15 • (A 3 , which is almost zenithal). The GCPs were easily obtained in these images as the intersections of the grid lines. The pixel co-ordinates of the GCPs were manually digitized with an error estimated as ∼2 px. The unit length in the real-world, "u", was the side of the squares of the grid (around 5 cm).
For each of the three angles, eight different subsets (S 0 to S 7 ) of the whole set of grid intersections (∼80) were considered to be the GCPs for calibration. Figure 4 shows the eight subsets for the angle A 1 ; similar displays were considered for the other images in Figure 3 (although, necessarily, with some differences between the images). While S 0 considers all the available intersections of the grid as GCPs, the rest of the sets include eight GCPs distributed in different ways. Leaving aside the especial case S 0 , some sets correspond to (and are motivated by) real case conditions. For instance, the set S 3 resembles Figure 1A and set S 6 resembles Figure 1B. The other sets were designed to analyze the results from a more theoretical point of view (e.g., see S 1 and S 2 ). The set S 1 would correspond to Figure 1B if the horizon line was included (the horizon line is not analyzed in this work). While eight GCPs is a reasonable number of GCPs in usual practice [3,14], and was considered for the reference case, similar displays with 6 and 12 GCPs were also considered for sets S 1 to S 7 .
For each angle and subset of GCPs, and for each of the two models (M 1 and M 2 ), J = 60 perturbed calibrations were performed for analysis.

Results
The results for the two cameras, three angles, three series of number of GCPs, the eight GCP distributions, and for the two methods, are given in the Supplementary Materials. The main results are presented in this section. Figure 5 shows the distribution of the perturbed calibration errors P for all the subsets of GCPs, for both models and for angle A 1 (the results for A 2 and A 3 were similar in this regard; not shown). Each boxplot contains information of the J = 60 perturbations. The calibration errors P were smaller for M 1 than for M 2 for all subsets; this is a natural consequence of the model M 2 (with eight parameters) being a particular case of model M 1 (with 14 free parameters). However, it is noteworthy that model M 2 , with around half of the parameters than M 1 , still had small calibration errors, with P 3 px. Also, from Figure 5, we can see that: (1) for M 1 , the errors were larger for S 0 (i.e., using all the available points as GCPs); and (2) for M 2 , the error was minimum for S 2 and S 4 . Furthermore, there were no outliers; that is, all the calibrations can be considered to be satisfactory. As argued above, the error˜ defined in Equation (5) gives us a better idea about the usability of the calibrations along the image. Figures 6 and 7 show the errors˜ for all the available points for models M 1 and M 2 , respectively, using the perturbed calibrations of the different subsets S k (the GCPs of the subsets S k are highlighted with small white circles, for ease of viewing). The results in Figures 6 and 7 are for the angle A 1 (the angles A 2 and A 3 showed the same trends, although with higher errors, as shown below through Q ). As depicted in the figures, the errors remained small at the GCPs of each subset S k . The behaviour outside the region S k was better for model M 2 than for M 1 , especially for the cases S 2 and S 4 -S 7 ; it can be seen that the color was saturated at˜ = 20 px, but the errors increased up to ∼10 3 px for S 2 and S 4 in model M 1 .  Recalling Equation (6), Figure 8 shows the RMS of the errors˜ in Figures 6 and 7 for angle A 1 and for all subsets S k . The error Q considers all the pixels in the image (S 0 ), while the error G only considers the pixels used for the calibration (i.e., those highlighted in Figures 6 and 7) for the RMS. Naturally, both errors coincided for S 0 . As already suggested from Figures 6 and 7, the errors G were small in all cases; these errors were related to the errors P in Figure 5. With regard to the error Q , which evaluates the quality of calibration in the whole image, model M 2 yielded significantly smaller errors than M 1 , except for the very particular set S 0 . For model M 2 (Figure 8B), all sets yielded overall errors Q below 10, except for S 2 (pixels near the center of the image) and S 4 (centered in the lower half of the image). The sets S 2 and S 4 were those with smaller errors P and G . The sets with smaller overall errors Q were S 1 (ideal uniform distribution all over the image) and S 3 (lower half of the image), while sets S 5 -S 7 gave similar results.

Influence of the Obliquity of the Number of Gcps
The influence of the obliquity of the image on the errors Q (as well as on G ) is shown in Figure 9. This figure, an extension of Figure 8, includes the results for all three angles. The trends for angles A 2 and A 3 were, with respect to the model and the set S k , similar to those described above for A 1 . In particular, the errors Q were, in general, too large for M 1 (despite the errors G being very small). For model M 2 , the errors Q slightly increased for A 2 and A 3 , subsets S 2 and S 4 giving the highest overall errors Q .
Similarly, the influence of the number of GCPs (for sets S 1 to S 7 ) on the errors Q and G is shown in Figure 10 for A 1 . Figure 10 is an extension of Figure 8 and includes the results also for 6 and 12 GCPs. From Figure 10, it can be seen that the model M 2 keeps the overall errors Q small, even with only 6 GCPs; except for S 2 and S 4 . With regard to model M 1 , while the errors Q decreased for 12 GCPs (relative to 8 GCPs), they were still larger than for M 2 . In the following section, we restrict again to 8 GCPs for S 1 to S 7 .

Calibration Parameters
From a practical point of view, the above errors Q are the most interesting results in the camera calibration problem. However, the recovery of the calibration parameters is also an issue of practical interest (e.g., recovering the camera position or the intrinsic parameters from a single snapshot). Figure 11 shows the results (using always all the J perturbations) for the radial distortion k 1 and s c for both models and for angle A 1 . Please note that the intrinsic parameters (k 1 and s s for M 2 , and many other in the complete model M 1 ) must be independent of the angle considered-extrinsic parameters, on the contrary, depend on the image (angle)-. The information in this figure contains the results for A 1 , the results fro A 2 and A 3 being similar (not shown). From Figure 11, the results for M 1 show a large variability when compared to those for model M 2 . Model M 2 , except for sets S 2 and S 4 -and in particular for the radial distortion k 1 -, shows small standard deviations in the boxplots. Having small standard deviations means that all perturbed calibrations give similar values of the parameters, so that the results are trustable. The rest of intrinsic parameters in model M 1 (k 2 , p 1 , . . . ) have a similar behaviour than that of k 1 and s c (i.e., with large standard deviations, not shown). Given that the model M 2 performed similar to M 1 in terms of G , while giving smaller overall errors Q (Figure 8) and, further, provides more trustable results for the intrinsic parameters, we will focus on M 2 for the extrinsic parameters (model M 1 provides noisy results for the extrinsic parameters, as it does for the intrinsic ones; not shown).
The extrinsic parameters (x c , y c , z c , φ, σ, and τ) depend on the image (angle) considered, as already mentioned. Figure 12 shows the results for the camera position (x c , y c , and z c ) for angles A 1 -A 3 using the reduced model M 2 . For each angle, given that the results for S 0 (with ∼80 GCPs) had the smallest standard deviation (i.e., were the most trustable), the mean value for S 0 was subtracted in all cases (x c,S 0 , y c,S 0 , z c,S 0 ). In this way, the variability of the parameter is shown for each angle A i independently of the actual values of the parameters, which are of minor interest here (and different for all three cases). From Figure 12, angle A 1 (with the larger obliquity) produced good estimates of the camera position, except (again) when using sets S 2 and S 4 . The results worsened for angles A 2 and, especially, A 3 (∼zenithal). The results for the Eulerian angles φ, σ, and τ are shown in Figure 13. The results followed the same trends as for the camera position; that is, case A 1 gave more robust results than A 2 and much more than A 3 , and S 2 and S 4 performed especially bad. It is worth noting that τ S 0 = 0.95 rad ≈ 54 • for A 1 , τ S 0 = 0.76 rad ≈ 44 • for A 2 , and τ S 0 = 0.37 rad ≈ 21 • for A 3 .

Discussion
The above results-on full calibration of a camera from one single snapshot-show that there is no correlation of the overall quality of the calibration (which can be measured in terms of Q ) with the error obtained in the optimization process to obtain the calibration parameters. However, in real calibrations, the error Q cannot be known, while only * (similar to P and G ) can be obtained.
The latter errors being small only ensures, in general, good performance of the calibration around the calibration GCPs (Figures 6 and 7 are clear, in this regard).
The results show that the choice of GCPs is crucial to obtain an effective real calibration (i.e., minimal Q values). Ideally, the overall calibration errors Q should be minimized by using a large number of GCPs covering the entire image. However, in real situations, the calibration GCPs are limited to a small region of the image, while other parts of the image can be of interest to the research. For example, in Figure 1B, the GCPs would usually be located in the promenade (where there are lots of observable features), while the focus is on the shoreline or the water area. Furthermore, the amount of GCPs is limited for functional requirements. Our findings show that good quality calibrations can be obtained with a limited number of GCPs when at least some of them are placed at the edges of the image. In these cases, even without having the smallest G , the Q errors are small. On the other hand, when all the GCPs are centered in the image, the calibration quality may be poor (large Q ), even if G are small. The justification for this and other behaviours is presented below.
The selection of an appropriate calibration model is essential. Ideally, when a large number of GCPs are available and cover the whole image, the complete model (M 1 ) is the best, both with regard to G and Q (Figure 8 for S 0 ). This can typically be done under laboratory conditions but is not the case in coastal studies; particularly when taking advantage of freely available coastal images. For a realistic set of GCPs, the reduced model M 2 provided, in all studied cases, the highest quality calibrations. Again, we found the (kind of) paradoxical result that the best Q were obtained with the model M 2 , although the calibration errors were always smaller in model M 1 and, therefore, could seem to be more robust. From the above results (Figure 10), the advantage of the model M 2 compared to M 1 is evident for a reduced number of GCPs (6), remaining even when it is incremented to more reasonable values (12).
The model M 2 behaving better than M 1 is related to the noise in the recovery of the calibration parameters for model M 1 (illustrated in Figure 11 for k 1 and s c ), as explained below. Having just one snapshot to perform the calibration may lead, especially if the GCP distribution is not favourable (as in S 2 or S 4 ), to many different combinations of parameters providing small calibration errors ( * ∼ G ) but large overall errors Q . In the complete model, M 1 , this compensation of different calibration variables to give similar calibration errors * is much more pronounced, as it contains more parameters: this explains the large deviations of the parameters k 1 and s c in Figure 11 (and also in the rest of the calibration parameters; not shown) and the larger errors Q , except for in S 0 (Figures 9  and 10). Model M 1 was overparametrized for 6 GCPs and, for 8 and 12 GCPs, still showed symptoms of overparametization behaviours. Focusing on the simple model, M 2 , the above compensation mechanism shows up in the worse case S 2 (and in S 4 ). In the model M 2 , the role of the physical distance from the camera position to the GCPs (i.e., the co-ordinates of x c ), the size s c and the distortion k 1 can be compensated if the GCPs fall near the center of the image, when the role of the distortion cannot be clearly distinguished. This the reason the set S 2 showed large deviations in the camera position (see Figure 12) and k 1 (Figure 11 for M 2 ). For this model, these mechanisms were enhanced for small τ (angle A 3 , Figures 12 and 13), giving slightly larger errors Q in Figure 9. The angle A 1 gave more robust results (in the calibration parameters) due to the fact that, by increasing the relative distances between the different GCPs, the calibration parameters were more accurately captured. Zenithal images with the GCPs concentrated in the center of the image led to the worst quality calibration errors Q , despite achieving an excellent calibration error G (Figure 9, set S 2 ).
For calibration purposes, we recommend the use of model M 2 and the selection of the GCPs such that some of them fall near the edges of the image. Whenever the recovery of the camera position and orientation is of interest, using zenithal views should be avoided. The use of the simple model M 2 to properly georeference images obtained by different devices using just a few GCPs opens up a range of possibilities for the analysis of images from webcams or beach users and the quantification of different parameters of interest (e.g., position and shape of the coastline, . . . ). Furthermore, in fixed video monitoring systems, even if the camera has been intrinsically calibrated prior to its final deployment, the intrinsic calibration (as well as the extrinsic one) can change in time, due to changing external conditions, and continuous re-calibration of the parameters may be required.

Conclusions
In this work, we analyzed the influence that the distribution of GCPs and image obliquity has on the overall quality of full (intrinsic and extrinsic) camera calibration using only a single snapshot. This was done by analyzing the performance of two calibration mathematical models. We conclude that, for the calibration of coastal images-especially when only one image is available-the reduced model should be used. This reduced model provided robust camera calibration parameters (camera position, Eulerian angles, pixel size, and radial distortion) in our tests, allowing for an explicit transformation from pixel to real-world co-ordinates and, most importantly, yielded smaller overall calibration errors. With respect to the distribution of the GCPs over the image, using calibration points only near the centre of the image must be avoided, and we recommend using the maximum number of points distributed along the edges of the image. Finally, zenithal views complicate the recovery of the calibration parameters, although the obliquity does not have a significant influence on the overall performance of the calibration.