Surveillance Video Georeference Method Based on Real Scene Model with Geometry Priors

: With the comprehensive promotion of digital construction in China, cameras scattered throughout the country are of great signiﬁcance in obtaining ﬁrst-hand data. However, their potential role is limited due to the lack of georeference information on current surveillance cameras. Provided surveillance camera images and real scenes are combined and given georeference information, this problem can be solved, allowing cameras to generate signiﬁcant social beneﬁts. This article proposed an accurate registration method based on misalignment calibration and least squares matching between real scene and surveillance camera images to address this issue. Firstly, it is necessary to convert the navigation coordinate system from which cameras obtain data to the photogrammetric coordinate system and then solve for the misalignment and internal orientation elements of the camera. Then, accurate registration is achieved using the least squares matching on pyramid images. The experiment obtained surrounding image data of two common scenes with lens pitch angles of 45 ◦ , 55 ◦ , 65 ◦ , 75 ◦ , and 85 ◦ using the surveillance camera and obtained a 3D real scene model of each scene using a low-altitude aircraft. The experiment results show that the proposed method in this paper can achieve the expected goals of accurately matching real scene and surveillance camera images and assigning georeference information. Through extensive data analysis, the success rate and accuracy rate of registration are 98.1% and 97.06%, respectively.


Introduction
With the rapid progress of urbanization in China, there is video surveillance on urban buildings, roads, military strongholds, factories, and so on [1][2][3], which is responsible for public security management in cities, road control, illegal invasion, and illegal operation.However, the amount of data is massive, making it difficult to observe the region of interest.When the scale of the surveillance system exceeds the monitoring capabilities of humans, security operators must mentally map each surveillance monitor image to a corresponding area in the real world.This progress is very abstract and requires prior training for viewers [4].Thus, the traditional method of manually watching and analyzing videos is no longer applicable, and intelligent video surveillance systems have emerged as the times require.In recent years, intelligent monitoring devices have developed rapidly and have achieved integration with Geographic Information Systems (GIS).However, there is still a problem of insufficient registration accuracy, resulting in low positioning accuracy.
Although there are thousands of cameras collecting a large amount of data every day [3], their greater role has not been fully realized.The most important drawback is that existing cameras do not have georeference information.Combining the image information obtained by the camera with geographic information, the retrieval of real-time information about a certain location will be obtained quickly.Currently, China is fully promoting digital construction, and the research in this paper provides a significant theory and method for promoting Smart City.With the combination of cameras and geographic information, all cameras will no longer only play a monitoring role; instead, each camera will be a powerful data source and basis for urban resource monitoring, urban security management, forest fire prevention, and other aspects.If a traffic accident occurs somewhere in the city, the relevant surveillance video of the accident location cannot be retrieved quickly without being georeferenced [5].Similarly, a fire in the forest or the discovery of illegal buildings in a certain location cannot be located quickly.If intelligent monitoring with thermal sensors and actuator systems are integrated, the temperature of the ignition point through thermal sensors can be monitored and provide immediate feedback to the fire department, thereby minimizing losses and even achieving the goal of preventing fires.
Therefore, this paper proposed a camera georeference method based on misalignment calibration and least squares image matching, which solves the problem that the image cannot be quickly and accurately located through the camera and achieves the effect of integrating image information and GIS information [6].
The innovation of this paper lies in proposing a new mathematical model to calculate camera parameters and misalignment parameters, as well as a method to achieve accurate registration of surveillance camera images and real scenes.Rapid and accurate matching of real scene information and surveillance camera information is realized, and the goal of quickly locating a place of interest and retrieving relevant images is achieved [7].
The remaining paper is organized as follows: Section 2 presents related works regarding video surveillance, integration of real scene information and GIS information, and camera calibration.In Section 3, the proposed methods and their key steps in detail are described, including the transformation method of the navigation coordinate system, a method of getting the internal parameters and misalignment parameters, and a method of using the least squares image matching based on pyramid images to achieve accurate registration.In Section 4, the implementation of the proposed method and the obtained experimental results are presented.Finally, Section 5 presents the conclusions and prospects for future research.

Related Works
As early as 1942, Siemens AG installed the first video surveillance system in Germany to monitor the launch of V-2 rockets [8].Later, in order to combat crimes, the US installed video surveillance on its main commercial streets in 1968.The above are all traditional cameras based on a matrix of video displays, maps, and indirect controls.However, the goal of intelligent video surveillance is to efficiently extract useful information from a large amount of video surveillance by automatically detecting, tracking, and identifying objects of interest and understanding and analyzing their activities.
Modern video surveillance systems rely on automation through intelligent video surveillance and better display of surveillance data through context-aware solutions and integration with virtual GIS environments [9].Souleiman et al. used geospatial data for camera pose estimation and conducted 3D building reconstruction.They proposed a method based on GPS measurement, video sequences, and rough 3D model registration of buildings [10].Schall et al. proposed a method that relies on GPS and an inertial measurement unit (IMU) to perform camera attitude estimation, thereby enhancing the visualization of underground GIS infrastructure applications in reality [11].Lewis et al. made use of georeferenced video data and focused on using Viewpoint data structures to represent video frames to enable geospatial analysis and considered the potential of spatial video as video data to represent georeferencing [12].Xie et al. proposed the integration of GIS and moving objects in surveillance videos by using motion detection and spatial mapping [13].Robert T. Collins et al. proposed a VSAM testbed system based on video surveillance and monitoring data for three years.The system can achieve automatic tracking of targets [4].The purpose of the above work is to achieve the integration of image information and GIS information, with the aim of enhancing reality; however, they cannot achieve accurate matching between the real scene and surveillance camera images.
In terms of camera calibration work, Zhang proposed a simple camera calibration technique to determine radial distortion by observing a planar pattern shown at a few different orientations [14].Lee and Nevatia developed a video surveillance camera calibration tool for urban environments that relies on vanishing point extraction [15].Vanishing points are easily obtainable in urban environments since there are many parallel lines, such as street lines, light poles, buildings, etc.The calibration of environmental camera images by means of the Levenberg-Marquardt method has been studied by Muñoz et al. [16].Although these correction methods are good, they do not have universality.Based on the characteristics of information obtained from real scenes and surveillance camera images, a new mathematical correction model to solve camera parameters is proposed in this paper.
In the research on automatic feature point detection, many people have compared and analyzed various extraction algorithms [17][18][19].In addition, F Remondino et al. proposed that image matching was one of the key steps in 3D modeling and mapping in 2014 [20].Saleem et al. conducted a study between remote sensing images and UAV imagery in 2016 [21].In 2017, Xiaohui Yuan et al. proposed a method that uses a time-of-flight camera to detect the feature points and action tracking [22].
Although many people have proposed some good ideas and put them into practice, there are still many shortcomings.
Over the same field, they were using different feature points and determining their performance:

•
For traditional monitoring, when the scale of the surveillance system exceeds the monitoring capabilities of humans, security operators must mentally map each surveillance monitor image to a corresponding area in the real world [9]; This method is manually operated, so it has great automation potential; • This method is unable to achieve accurate registration of images and actual ground.
Our research can overcome the above problems, achieve the integration of image information and real scenes, and achieve fast and accurate matching.In recent years, our country has vigorously developed digitization, and the research can provide powerful theories and methods for the progress and development of digital cities, especially making important contributions to China's social development and urban progress.

Methods
In order to achieve accurate registration of surveillance camera images and real scenes, the first step is to convert the position and attitude parameters obtained by the surveillance camera into a photogrammetric coordinate system.The second step is to calibrate the camera and misalignment parameters.Then, the surveillance camera images and real scenes are accurately registered using least squares matching based on geometry priors.The framework of the integration of surveillance video images and real scenes is shown in Figure 1.

Coordinate System Transformation for Surveillance Video
In the process of the surveillance camera collecting data, the position and attitude recorded by the surveillance equipment are based on the navigation coordinate system [23].However, the actual application is under the map projection frame, so the navigation coordinate system needs to be converted into the photogrammetric coordinate system.
Generally, the navigation coordinate system is represented by Yaw, Pitch, and Roll, while the exterior orientation parameters (EOs) of each image frame in the photogrammetric coordinate system are generally represented by ω, ϕ, κ, referred to as the OPK angle system.In order to convert the (Yaw, Pitch, Roll) into the (ω, ϕ, κ), the definition of different coordinate systems and rotation angles must be considered.The reference frames and their representation used in this paper are shown in Table 1.

Coordinate System Transformation for Surveillance video
In the process of the surveillance camera collecting data, the position and attitu recorded by the surveillance equipment are based on the navigation coordinate syste [23].However, the actual application is under the map projection frame, so the navigati coordinate system needs to be converted into the photogrammetric coordinate system.
Generally, the navigation coordinate system is represented by Yaw, Pitch, and Ro while the exterior orientation parameters (EOs) of each image frame in the photogram metric coordinate system are generally represented by ω, φ, κ, referred to as the OP angle system.In order to convert the (Yaw, Pitch, Roll) into the (ω, φ, κ), the definition different coordinate systems and rotation angles must be considered.The referen frames and their representation used in this paper are shown in Table 1.
Table 1.Overview of the required frames.

Frames
Abbreviation Navigation frame g Body frame u Camera frame c Map projection frame s In addition, it is necessary to consider the mapping system used, as well as the impa of the curvature and meridian deviations of the Earth on the angle [24].The z-axes of t navigation coordinate system and the projection coordinate system both point upwa along the ellipsoidal normal, but the y-axis of the navigation coordinate system poin toward the true north direction, while the y-axis of the projection coordinate system poin toward the grid north direction.Both x-axes are perpendicular to the plane composed their respective y-axis and z-axis.And the relationship between the navigation coordina system and the projection coordinate system is shown in Figure 2. Table 1.Overview of the required frames.

Frames Abbreviation
Navigation frame g Body frame u Camera frame c Map projection frame s In addition, it is necessary to consider the mapping system used, as well as the impact of the curvature and meridian deviations of the Earth on the angle [24].The z-axes of the navigation coordinate system and the projection coordinate system both point upward along the ellipsoidal normal, but the y-axis of the navigation coordinate system points toward the true north direction, while the y-axis of the projection coordinate system points toward the grid north direction.Both x-axes are perpendicular to the plane composed of their respective y-axis and z-axis.And the relationship between the navigation coordinate system and the projection coordinate system is shown in Figure 2. The meridian deviation mainly affects the orientation relative to geographic orientation [24], and the computational formula of the meridian deviation is as follows: The relationship between the navigation coordinate system and the projection coordinate system.
The meridian deviation mainly affects the orientation relative to geographic orientation [24], and the computational formula of the meridian deviation is as follows: where t = tan β, η = e • cos 2 β, and e is the second eccentricity.β is the latitude of the projective point and λ is the longitudinal difference between the projective point and central meridian of the universal transverse Mercator (UTM)-coordinate system.Due to the meridian deviation, there is a distortion in the north direction, and this distortion is recorded as γ.To eliminate the effects of the meridian deviation, the coordinate system must be rotated γ around the Zn-axis.Therefore, a transformation matrix is required to compensate for the meridian deviation.Where g' is the navigation coordinate system that has eliminated the meridian convergence The lower index of all the following formulas represents the original system, while the upper index represents the target system.
Due to the different directions of the coordinate axes in navigation and in photogrammetry, two additional transformation matrices are needed to obtain an equivalent oriented system, namely from the body coordinate system (u) to the camera coordinate system (c) and from the projection coordinate system (s) to the navigation coordinate system that has eliminated the meridian convergence (g').The two transformation matrices are shown as follows: And from the navigation coordinate system (g) to the body coordinate system (u) is as follows: where φ is Yaw, θ is Pitch and ψ is Roll.
Combining the above transformation matrices can get the rotation matrix R c s of photogrammetry, which is made up of the attitudinal angles of images (ϕ, ω, κ) as follows:

Camera and Misalignment Calibration for Surveillance Video
The calibration method in this paper is to use existing real scenes to pick up control points.The real scenes used in this paper are all obtained from drone images processed by ContextCapture to ensure their accuracy.However, due to the low resolution of the existing real scene model, very few feature points are extracted in weak texture regions.Relying solely on a single camera to obtain control points in one direction is not sufficient for camera calibration, so we need to obtain surrounding image data.The entire solution process model is shown in Figure 3.
The calibration method in this paper is to use existing real scenes to pick up control points.The real scenes used in this paper are all obtained from drone images processed by ContextCapture to ensure their accuracy.However, due to the low resolution of the existing real scene model, very few feature points are extracted in weak texture regions.Relying solely on a single camera to obtain control points in one direction is not sufficient for camera calibration, so we need to obtain surrounding image data.The entire solution process model is shown in Figure 3.The surveillance camera records the relative attitude rel R between images with high accuracy.Moreover, due to various uncertainties during installation, the camera may have misalignment, resulting in the camera not being horizontal or not pointing to the specified zero direction.Only the camera parameters, the misalignment of the surveillance camera, and the position of the surveillance camera are considered unknowns.The initial value of the camera parameters can be obtained by the EPnP method [25].Based on the principle of spatial resection, the model of calibration can be conducted as: The surveillance camera records the relative attitude R rel between images with high accuracy.Moreover, due to various uncertainties during installation, the camera may have misalignment, resulting in the camera not being horizontal or not pointing to the specified zero direction.Only the camera parameters, the misalignment of the surveillance camera, and the position of the surveillance camera are considered unknowns.The initial value of the camera parameters can be obtained by the EPnP method [25].Based on the principle of spatial resection, the model of calibration can be conducted as: where (X, Y, Z) is the coordinate of the ground control point and λ is the scaling factor.The corresponding image point (x, y) is the observation.The unknowns include principal distance f , principal point (x 0 , y 0 ), perspective center (X S , Y S , Z S ) and misalignment R mis .The R mis is the rotation matrix that rotates from the placement direction to the zero direction.
From the above analysis, it can be seen that this equation has nine unknowns, and each ground control point corresponds to a coordinate observation value of an image point.Two error equations can be formulated, so it is required to solve this equation with at least five non-coplanar and relatively evenly distributed control points.Take Mount Tai Square of Shandong University of Science and Technology from a low-altitude aircraft as an example.The schematic diagram is shown in Figure 4.
Remote Sens. 2023, 15, x FOR PEER REVIEW 7 of 15 where ( , , ) X Y Z is the coordinate of the ground control point and  is the scaling factor.
The corresponding image point ( , ) x y is the observation.The unknowns include princi- pal distance f , principal point 0 0 ( , ) x y , perspective center ( , , )

S S S
X Y Z and misalign- ment mis R .The mis R is the rotation matrix that rotates from the placement direction to the zero direction.From the above analysis, it can be seen that this equation has nine unknowns, and each ground control point corresponds to a coordinate observation value of an image point.Two error equations can be formulated, so it is required to solve this equation with at least five non-coplanar and relatively evenly distributed control points.Take Mount Tai Square of Shandong University of Science and Technology from a low-altitude aircraft as an example.The schematic diagram is shown in Figure 4. Linearize the above equation to obtain the error equation, and the result is shown in Equation (7).The initial values of the following parameters are , the initial values of f is the focal length of the camera, and the initial values of other parameters can be provided using the triangulation Linearize the above equation to obtain the error equation, and the result is shown in Equation (7).The initial values of the following parameters are ϕ mis = 0, ω mis = 0, κ mis = 0, x 0 = width/2, y 0 = height/2, the initial values of f is the focal length of the camera, and the initial values of other parameters can be provided using the triangulation method.ϕ mis , ω mis , κ mis are the three angles for misalignment angle, L denotes the constants, and A, B, C are coefficient matrices.
Then, simplify the above equation; the matrix form of the error equation can be expressed as Equation (8).And the normal equation is conducted as Equation ( 9).
Solving the normal equation can obtain X 1 , X 2 and X 3 , which includes the internal parameters and the EOs for the first image.The correction values are added to their initial values, and the process is iterated until the obtained correction values are smaller than the allowable error.

Accurate Registration Method with Geometry Priors
After calibration, preliminary registration of surveillance camera images and real scenes has been achieved, but due to residual attitude angle errors in the previous step, strict registration cannot be achieved.Therefore, it is also necessary to use the least squares method for accurate registration.The least squares image matching process is shown in Figure 5.After calibration, preliminary registration of surveillance camera images and real scenes has been achieved, but due to residual attitude angle errors in the previous step, strict registration cannot be achieved.Therefore, it is also necessary to use the least squares method for accurate registration.The least squares image matching process is shown in Figure 5. Considering the fact that projected images of real scenes may have linear gray-scale distortions compared to images captured by cameras; therefore,  Considering the fact that projected images of real scenes may have linear gray-scale distortions compared to images captured by cameras; therefore, where g 1 (x, y) represents a point on the camera image, g 2 (a 0 + f (x), b 0 + f (y)) represents a point on the real scene, h 0 and h 1 is the radiation deformation correction parameter, a 0 and b 0 is the offset, f (x) is the x-coordinate of the point on the real scene, f (y) is the y-coordinate of the point on the real scene.Linearizing the equation, the error equation for least squares image matching can be obtained: The initial value is set as: h 0 = 0, h 1 = 1, a 0 = 0, b 0 = 0 and the observed value ∆g is the gray-scale difference of the corresponding pixels.Next, solve the coefficients of the error equation representing Equation (11) in matrix form and perform a normalization solution.Finally, compare the obtained correction number with the tolerance to determine whether to continue the iteration.
Accurate matching can be achieved by completing the above operations, but in the face of large data volumes, the matching efficiency will be greatly reduced.Therefore, pyramid image matching is considered.By constructing pyramid images, a matching strategy from top to bottom and from coarse to fine is adopted to achieve fast and accurate image matching.The basic principle is that due to low-pass filtering and sampling, the top of the pyramid retains the most obvious, energy-intensive, and large feature structure features.However, small-scale and weak textures are annihilated by multiple smoothing.Because the top of the pyramid is an image generated after multiple filters, mainly including lowfrequency components.Therefore, feature matching at the highest level of the pyramid is more robust for features that are structurally large and have strong contrast.The projected image can be obtained by rasterizing the triangular mesh of the real scene.There is a function named "Raster" that can convert a triangular mesh into a depth map in OpenMVS.Based on this, the texture can be projected onto the image to ensure its projection accuracy.The entire accurate registration process is shown in Figure 6.

Description of Experimental Equipment
The surveillance camera equipment used in this experiment is DH-SD-8A1440XA HNR, with a minimum focal length of 5.5 mm and a maximum focal length of 220 mm. is equipped with a 1/1.8-inchCMOS sensor in which the image size is 2560 × 1440, and th pixel size is about 1.97 μm.It has a range of 61.4 to 2.27° horizontal and 35.99 to 1.3° ve tical field of view (FOV).The heading angle can rotate continuously from 0 to 360°, an the pitch angle range is between −30 and 90° for continuous monitoring.The experiment equipment parameters are shown in

Description of Experimental Equipment
The surveillance camera equipment used in this experiment is DH-SD-8A1440XA-HNR, with a minimum focal length of 5.  To verify the performance of the proposed method, systematic experiments and analyses are performed in this paper using surveillance camera equipment.In this experiment, surrounding image data with lens pitch angles of 45

Result and Analysis of Camera Calibration
Considering some weak texture fields, feature point extraction is difficult.And the limited feature points obtained by the camera in a single direction it is not sufficient for camera calibration.Therefore, the method of obtaining surrounding image data is adopted to solve the problem.Next, take Scene 1 as an example; by picking eight feature points (g1, g2, g3, g4, g5, g6, g7, g8) on a 3D real scene model and the camera captured a total of six images by obtaining surrounding image data.On the 3D real scene model, the selected feature points on the plane can obtain the control point coordinates with an accuracy of about 2 cm.At the same time, edge points cannot be picked because the error is significant.The first image contains two feature points (g1, g2), the second image contains three feature points (g2, g3, g4), the third image contains two feature points (g4, g5), the fourth image contains two feature points (g5, g6), the fifth image contains three feature points (g5, g6, g7), and the sixth image contains three feature points (g7, g8, g1).The distribution of feature points and the distribution of feature points for each image are shown in Figure 8.Moreover, the results of camera calibration are shown in Tables 3 and 4.After the calibration is completed, the re-projection error of known ground control points and unknown points obtained from multi-view space intersections are analyzed.Through experimental analysis, it is known that the re-projection error is less than 0.2 pixels.

Result and Analysis of Camera Calibration
Considering some weak texture fields, feature point extraction is difficult.And the limited feature points obtained by the camera in a single direction it is not sufficient for camera calibration.Therefore, the method of obtaining surrounding image data is adopted to solve the problem.Next, take Scene 1 as an example; by picking eight feature points (g1, g2, g3, g4, g5, g6, g7, g8) on a 3D real scene model and the camera captured a total of six images by obtaining surrounding image data.On the 3D real scene model, the selected feature points on the plane can obtain the control point coordinates with an accuracy of about 2 cm.At the same time, edge points cannot be picked because the error is significant.The first image contains two feature points (g1, g2), the second image contains three feature points (g2, g3, g4), the third image contains two feature points (g4, g5), the fourth image contains two feature points (g5, g6), the fifth image contains three feature points (g5, g6, g7), and the sixth image contains three feature points (g7, g8, g1).The distribution of feature points and the distribution of feature points for each image are shown in Figure 8.Moreover, the results of camera calibration are shown in Tables 3 and 4.After the calibration is completed, the re-projection error of known ground control points and unknown points obtained from multi-view space intersections are analyzed.Through experimental analysis, it is known that the re-projection error is less than 0.2 pixels.
process is shown in Figure 10.The experiment used 20 sets of data from two common scenes, collecting obvious feature points such as window corners, floor intersections, and obvious boundaries of natural features as registration points.According to the statistical results of a large amount of data, the registration success rate of the proposed method in this paper reaches 98.1%, and the accuracy rate reaches 97.06%.The success rate and accuracy rate of each group of experimental data are shown in Table 7.The success rate is equal to dividing the successful cases by the selected cases, while the accuracy rate is equal to dividing the right cases by the successful cases.The success rate indicates how many of the sample points we have selected have been successfully registered.Successful registration may not necessarily be the samples of interest to us, and the correct registration of interest is the accuracy rate.
From the experimental results, it can be seen that some cases failed to match.Through analysis, it can be concluded that the reasons for the failure of some sample points are as follows: (1) The features are not clear enough.(2) The local deformation of the 3D real scene model where the feature points are located.This will result in incomplete elimination of the geometric deformation of the feature points relative to the image, which affects the least squares matching.

Conclusions
The main contribution of this paper is to propose a method for accurate registration of real scene and surveillance camera images, which solves the technical difficulty that existing cameras do not have georeference information.The conversion relationship between the navigation coordinate system and the photogrammetric coordinate system is considered firstly, unifying image information and real scenes under the same coordinate reference, and then the paper proposes a mathematical model for camera internal parameter calibration.At the same time, the misalignment angle automatic calibration method based on the collinearity equations to calculate the camera misalignment parameters is used, and then extracted feature points are used for matching.So far, the rough matching has been completed.However, due to the influence of zoom lenses, surface elevation error, and attitude angle error, accurate matching cannot be achieved.Therefore, to achieve accurate matching of real scene and surveillance camera images, a support window estimation method of using least squares image matching based on pyramid images is proposed and achieves good results.
The theory and method of accurately registering real scenes and surveillance camera images proposed in this paper have made an extremely important attempt for the development of smart cities and digital cities.Compared to the previous research, it has made great progress.If this technology is put into practice, there will be significant efficiency improvements in urban security, traffic management, and fire monitoring in China.
However, due to our pioneering research, future research can explore more application directions for integrating surveillance camera image information and real scenes, as well as finding more efficient and accurate registering methods.
Author Contributions: J.L. conceived the idea and designed the experiments.;Z.Z.performed the experiments and analyzed the data.;Z.Z.wrote the main manuscript.;J.L. and Z.Z.reviewed the paper; Y.C. obtained the experimental data; M.F.polished the language of the paper.Funding acquisition: J.L.All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by the National Natural Science Foundation of China, grant number 42171439; Qingdao Science and Technology Demonstration and Guidance Project (grant no.22-3-7-cspz-1-nsh).

Figure 1 .
Figure 1.The integration of surveillance video images and real scenes.

Figure 1 .
Figure 1.The integration of surveillance video images and real scenes.

15 Figure 2 .
Figure 2. The relationship between the navigation coordinate system and the projection coordinate system.

Figure 3 .
Figure 3.The process of camera and misalignment calibration.

Figure 3 .
Figure 3.The process of camera and misalignment calibration.

Figure 4 .
Figure 4.The schematic diagram of internal calibration and calculate misalignment angle.

Figure 4 .
Figure 4.The schematic diagram of internal calibration and calculate misalignment angle.

Figure 5 .
Figure 5.The process of least squares image matching.
g x y h h g a f x b f y     (10) where 1 ( , ) g x y represents a point on the camera image, g a f x b f y   represents a point on the real scene, 0 h and 1 h is the radiation deformation correction param- eter, 0a and 0 b is the offset, ( ) f x is the x-coordinate of the point on the real scene, ( ) f y is the y-coordinate of the point on the real scene.Linearizing the equation, the error equation for least squares image matching can be obtained:

Figure 5 .
Figure 5.The process of least squares image matching.

Figure 6 .
Figure 6.The process of accurate registration.

Figure 6 .
Figure 6.The process of accurate registration.
5 mm and a maximum focal length of 220 mm.It is equipped with a 1/1.8-inchCMOS sensor in which the image size is 2560 × 1440, and the pixel size is about 1.97 µm.It has a range of 61.4 to 2.27 • horizontal and 35.99 to 1.3 • vertical field of view (FOV).The heading angle can rotate continuously from 0 to 360 • , and the pitch angle range is between −30 and 90 • for continuous monitoring.The experimental equipment parameters are shown in Table 2

Figure 9 .
Figure 9.The rough registration of real scenes and surveillance camera images.

Figure 9 . 15 Figure 10 .
Figure 9.The rough registration of real scenes and surveillance camera images.Remote Sens. 2023, 15, x FOR PEER REVIEW 13 of 15

Figure 10 .
Figure 10.Iterative process: (a) Scene 1; (b) Scene 2; (c) Scene 3; (d) Scene 4; (e) Scene 5.The first image in each group represents the source image, the second image represents 1 iteration, the third image represents 5 iterations, the fourth image represents 50 iterations, and the fifth image represents 100 iterations.

Table 3 .
The calibration result of camera parameters.

Table 7 .
The success rate and accuracy rate of the experiment.

Table 7 .
The success rate and accuracy rate of the experiment.