1. Introduction
With the rapid progress of urbanization in China, there is video surveillance on urban buildings, roads, military strongholds, factories, and so on [
1,
2,
3], which is responsible for public security management in cities, road control, illegal invasion, and illegal operation. However, the amount of data is massive, making it difficult to observe the region of interest. When the scale of the surveillance system exceeds the monitoring capabilities of humans, security operators must mentally map each surveillance monitor image to a corresponding area in the real world. This progress is very abstract and requires prior training for viewers [
4]. Thus, the traditional method of manually watching and analyzing videos is no longer applicable, and intelligent video surveillance systems have emerged as the times require. In recent years, intelligent monitoring devices have developed rapidly and have achieved integration with Geographic Information Systems (GIS). However, there is still a problem of insufficient registration accuracy, resulting in low positioning accuracy.
Although there are thousands of cameras collecting a large amount of data every day [
3], their greater role has not been fully realized. The most important drawback is that existing cameras do not have georeference information. Combining the image information obtained by the camera with geographic information, the retrieval of real-time information about a certain location will be obtained quickly. Currently, China is fully promoting digital construction, and the research in this paper provides a significant theory and method for promoting Smart City. With the combination of cameras and geographic information, all cameras will no longer only play a monitoring role; instead, each camera will be a powerful data source and basis for urban resource monitoring, urban security management, forest fire prevention, and other aspects. If a traffic accident occurs somewhere in the city, the relevant surveillance video of the accident location cannot be retrieved quickly without being georeferenced [
5]. Similarly, a fire in the forest or the discovery of illegal buildings in a certain location cannot be located quickly. If intelligent monitoring with thermal sensors and actuator systems are integrated, the temperature of the ignition point through thermal sensors can be monitored and provide immediate feedback to the fire department, thereby minimizing losses and even achieving the goal of preventing fires.
Therefore, this paper proposed a camera georeference method based on misalignment calibration and least squares image matching, which solves the problem that the image cannot be quickly and accurately located through the camera and achieves the effect of integrating image information and GIS information [
6].
The innovation of this paper lies in proposing a new mathematical model to calculate camera parameters and misalignment parameters, as well as a method to achieve accurate registration of surveillance camera images and real scenes. Rapid and accurate matching of real scene information and surveillance camera information is realized, and the goal of quickly locating a place of interest and retrieving relevant images is achieved [
7].
The remaining paper is organized as follows:
Section 2 presents related works regarding video surveillance, integration of real scene information and GIS information, and camera calibration. In
Section 3, the proposed methods and their key steps in detail are described, including the transformation method of the navigation coordinate system, a method of getting the internal parameters and misalignment parameters, and a method of using the least squares image matching based on pyramid images to achieve accurate registration. In
Section 4, the implementation of the proposed method and the obtained experimental results are presented. Finally,
Section 5 presents the conclusions and prospects for future research.
2. Related Works
As early as 1942, Siemens AG installed the first video surveillance system in Germany to monitor the launch of V-2 rockets [
8]. Later, in order to combat crimes, the US installed video surveillance on its main commercial streets in 1968. The above are all traditional cameras based on a matrix of video displays, maps, and indirect controls. However, the goal of intelligent video surveillance is to efficiently extract useful information from a large amount of video surveillance by automatically detecting, tracking, and identifying objects of interest and understanding and analyzing their activities.
Modern video surveillance systems rely on automation through intelligent video surveillance and better display of surveillance data through context-aware solutions and integration with virtual GIS environments [
9]. Souleiman et al. used geospatial data for camera pose estimation and conducted 3D building reconstruction. They proposed a method based on GPS measurement, video sequences, and rough 3D model registration of buildings [
10]. Schall et al. proposed a method that relies on GPS and an inertial measurement unit (IMU) to perform camera attitude estimation, thereby enhancing the visualization of underground GIS infrastructure applications in reality [
11]. Lewis et al. made use of georeferenced video data and focused on using Viewpoint data structures to represent video frames to enable geospatial analysis and considered the potential of spatial video as video data to represent georeferencing [
12]. Xie et al. proposed the integration of GIS and moving objects in surveillance videos by using motion detection and spatial mapping [
13]. Robert T. Collins et al. proposed a VSAM testbed system based on video surveillance and monitoring data for three years. The system can achieve automatic tracking of targets [
4]. The purpose of the above work is to achieve the integration of image information and GIS information, with the aim of enhancing reality; however, they cannot achieve accurate matching between the real scene and surveillance camera images.
In terms of camera calibration work, Zhang proposed a simple camera calibration technique to determine radial distortion by observing a planar pattern shown at a few different orientations [
14]. Lee and Nevatia developed a video surveillance camera calibration tool for urban environments that relies on vanishing point extraction [
15]. Vanishing points are easily obtainable in urban environments since there are many parallel lines, such as street lines, light poles, buildings, etc. The calibration of environmental camera images by means of the Levenberg—Marquardt method has been studied by Muñoz et al. [
16]. Although these correction methods are good, they do not have universality. Based on the characteristics of information obtained from real scenes and surveillance camera images, a new mathematical correction model to solve camera parameters is proposed in this paper.
In the research on automatic feature point detection, many people have compared and analyzed various extraction algorithms [
17,
18,
19]. In addition, F Remondino et al. proposed that image matching was one of the key steps in 3D modeling and mapping in 2014 [
20]. Saleem et al. conducted a study between remote sensing images and UAV imagery in 2016 [
21]. In 2017, Xiaohui Yuan et al. proposed a method that uses a time-of-flight camera to detect the feature points and action tracking [
22].
Although many people have proposed some good ideas and put them into practice, there are still many shortcomings.
Over the same field, they were using different feature points and determining their performance:
For traditional monitoring, when the scale of the surveillance system exceeds the monitoring capabilities of humans, security operators must mentally map each surveillance monitor image to a corresponding area in the real world [
9];
This method is manually operated, so it has great automation potential;
This method is unable to achieve accurate registration of images and actual ground.
Our research can overcome the above problems, achieve the integration of image information and real scenes, and achieve fast and accurate matching. In recent years, our country has vigorously developed digitization, and the research can provide powerful theories and methods for the progress and development of digital cities, especially making important contributions to China’s social development and urban progress.
5. Conclusions
The main contribution of this paper is to propose a method for accurate registration of real scene and surveillance camera images, which solves the technical difficulty that existing cameras do not have georeference information. The conversion relationship between the navigation coordinate system and the photogrammetric coordinate system is considered firstly, unifying image information and real scenes under the same coordinate reference, and then the paper proposes a mathematical model for camera internal parameter calibration. At the same time, the misalignment angle automatic calibration method based on the collinearity equations to calculate the camera misalignment parameters is used, and then extracted feature points are used for matching. So far, the rough matching has been completed. However, due to the influence of zoom lenses, surface elevation error, and attitude angle error, accurate matching cannot be achieved. Therefore, to achieve accurate matching of real scene and surveillance camera images, a support window estimation method of using least squares image matching based on pyramid images is proposed and achieves good results.
The theory and method of accurately registering real scenes and surveillance camera images proposed in this paper have made an extremely important attempt for the development of smart cities and digital cities. Compared to the previous research, it has made great progress. If this technology is put into practice, there will be significant efficiency improvements in urban security, traffic management, and fire monitoring in China.
However, due to our pioneering research, future research can explore more application directions for integrating surveillance camera image information and real scenes, as well as finding more efficient and accurate registering methods.