Integration of GIS and Moving Objects in Surveillance Video

: This paper discusses the integration of a geographic information system (GIS) and moving objects in surveillance videos (“moving objects” hereinafter) by using motion detection, spatial mapping, and fusion representation techniques. This integration aims to overcome the limitations of conventional video surveillance systems, such as low efﬁciency in video searching, redundancy in video data transmission, and insufﬁcient capability to position video content in geographic space. Furthermore, a model for integrating GIS and moving objects is established. The model includes a moving object extraction method and a fusion pattern for GIS and moving objects. From the established integration model, a prototype of GIS and moving objects (GIS–MOV) system is constructed and used to analyze the possible applications of the integration of GIS and moving objects.


Introduction
Video surveillance is conducted using images produced by finite cameras.Millions of cameras are collecting massive amounts of video data on a daily basis [1].The increasing number of installed cameras accompanied by the increasing amount of video data has created several challenging tasks for security monitoring systems, such as spatial-temporal behavior analysis of moving objects in surveillance video ("moving objects" hereinafter), video scene simulation, and regional status monitoring, which cannot be accomplished by relying on surveillance video images.Video geographic information system (V-GIS) was established to overcome the limitations of traditional security monitoring systems.It is a geographic environment sensing and analysis platform that integrates the traditional video analysis system and GIS organically.Using a unified geographic reference, geospatial data services can support the intelligent analysis of monitoring images to implement various functions, such as video data management [2], video image spatialization [3], and actual-reality fusion [4].Surveillance video data have several disadvantages, including massive data volume, sparse distribution of high-value information, complex semantics, and unstructured data organization; thus, they fall short of realizing the functions of V-GIS.
Recent studies on V-GIS focus on the aforementioned disadvantages of video data.Kong et al. [5] proposed a geo-video data model that can structurally process video data.Xie et al. [6] proposed a hierarchical semantic model for geo-video to represent geographic video semantics.Milosavljević et al. [7] implemented an efficient storage, analysis, and representation of monitoring video and geographic scene by the integration of GIS and surveillance video.However, studies

Related Work
To cope with the increasing number of installed cameras, modern video surveillance systems depend on automation through intelligent video surveillance and better representation of surveillance data through context-aware solutions with the use of GIS.Specifically, for extracting dynamic video information, intelligent analysis, such as moving object detection and tracking, is executed.For the positioning of moving objects in a geographic space, the geo-specialization of video image is necessary.For presenting video information and GIS together, fusion representation for video and virtual GIS environment should be established.In this sector, we introduce the related work on three aspects: extraction of moving objects in a video, geo-spatialization of a video, and fusion of GIS and video.
The objective of moving object detection and tracking is to extract the information on the spatial-temporal positions of moving objects in an image.Moving object detection extracts the sub-images of moving objects from the background in each sequence image.Moving object detection methods can be classified into three categories: background difference method [9,10], inter-frame difference method [11], and optical flow method [12].Moving object tracking determines the attributes of moving objects, such as speed, position, motion trajectory, and acceleration [13].The main methods for moving object tracking are divided into four categories: area-based tracking [14], active contour-based tracking [15], feature-based tracking [16], and model-based tracking [17].
The study on video geo-spatialization focuses on constructing the mapping relationship between the spatial sampling point set and the geospatial sampling point set.The methods for video geo-spatialization are divided into two categories: methods based on a homography matrix [18] and methods based on the intersection between sight and DEM [19].For methods based on homography matrix, a constraint condition based on the assumption of a planar ground in a geographic space is necessary.After searching for four or more points of the same name, the homography matrix could be solved.However, homography matrix-based methods are unsuitable for large-scale scenes or scenes with complex terrain.Furthermore, the need for creating points of the same name sets the method at a low degree of automation.Methods based on the intersection between sight and DEM are executed by solving the model of the sight line between the center of the camera and the image pixels.These methods require a high-precision DEM and are suitable for small-scale scenes with few artificial objects.
In recent years, other mapping methods have been reported.Lewis et al. [20] used a perspective projection model to simulate the video and GIS mapping process.Milosavljević et al. [7] adopted a reverse process by back projecting the position-determined objects onto the video image.These new methods require a high-precision camera.
The integration of GIS and video aims to unify the representations of video information collected from different geographic locations [21][22][23][24].The video images are displayed in a unified view by using a virtual scene model.Katkere [25] integrated GIS and video for the first time by using different mapping methods for different representations of moving objects and scenes in a video and constructed a system for generating an immersive environment by using multi-camera video data.According to the representation style, GIS and video fusion methods are divided into two categories: fusion of GIS and video image [20] and fusion of GIS and video object [1].In the fusion methods for GIS and video image, video images are directly displayed in the corresponding positions by using the camera parameters in the virtual scene.In the fusion methods for GIS and video object, the video background and moving foreground objects are displayed in the corresponding positions separately in the virtual scene.

Integration of GIS and Moving Objects
A surveillance video is a sequence of frame images.Each video image is a two-dimensional integer matrix.Video images are unstructured data that cannot be directly used for analytical comprehension, whereas moving objects are structured data that can be analyzed and understood.The integration of GIS and moving objects is a potential upgrade in the information fusion of GIS and video for comprehensive analysis and visualization.Extracting information from moving objects is vital in the integration of GIS and moving objects.This integration involves the following steps: extracting and georeferencing the moving objects in a surveillance video, selecting the fusion pattern for GIS and moving objects, and then representing them together.
The key technologies of integration of GIS and moving objects in video are shown as Figure 1.
ISPRS Int.J. Geo-Inf.2017, 6, 94 3 of 17 artificial objects.In recent years, other mapping methods have been reported.Lewis et al. [20] used a perspective projection model to simulate the video and GIS mapping process.Milosavljević et al. [7] adopted a reverse process by back projecting the position-determined objects onto the video image.These new methods require a high-precision camera.
The integration of GIS and video aims to unify the representations of video information collected from different geographic locations [21][22][23][24].The video images are displayed in a unified view by using a virtual scene model.Katkere [25] integrated GIS and video for the first time by using different mapping methods for different representations of moving objects and scenes in a video and constructed a system for generating an immersive environment by using multi-camera video data.According to the representation style, GIS and video fusion methods are divided into two categories: fusion of GIS and video image [20] and fusion of GIS and video object [1].In the fusion methods for GIS and video image, video images are directly displayed in the corresponding positions by using the camera parameters in the virtual scene.In the fusion methods for GIS and video object, the video background and moving foreground objects are displayed in the corresponding positions separately in the virtual scene.

Integration of GIS and Moving Objects
A surveillance video is a sequence of frame images.Each video image is a two-dimensional integer matrix.Video images are unstructured data that cannot be directly used for analytical comprehension, whereas moving objects are structured data that can be analyzed and understood.The integration of GIS and moving objects is a potential upgrade in the information fusion of GIS and video for comprehensive analysis and visualization.Extracting information from moving objects is vital in the integration of GIS and moving objects.This integration involves the following steps: extracting and georeferencing the moving objects in a surveillance video, selecting the fusion pattern for GIS and moving objects, and then representing them together.

Extraction and Georeferencing of Moving Objects
Moving object detection: Moving object detection is achieved by a background difference method, which uses the difference between the current image and the background image to detect the moving object and construct the background model B. The video frame Zn is different from the background model B, and the foreground area In of the video frame is detected as follows: Figure 1.Graph about the key technologies of integration of GIS and moving objects in video.

Extraction and Georeferencing of Moving Objects
Moving object detection: Moving object detection is achieved by a background difference method, which uses the difference between the current image and the background image to detect the moving object and construct the background model B. The video frame Z n is different from the background model B, and the foreground area I n of the video frame is detected as follows: Moving object storage: After the foreground region I n is obtained from different video frames, the attributes of the moving object, such as moving speed, position, motion trajectory, and acceleration, can be obtained by video moving object tracking [13].Subsequently, the corresponding storage model is constructed to record the moving object information.The general representation of the moving object storage model is shown as follows: where O denotes the set of all information of a moving object; C denotes the set of position information of the moving object in each frame; Fc denotes the sub-image of each frame of the moving object, the relevant attributes, and other collected data; f 1 i , f 2 i , . . .denote the moving objects in each frame with different characteristics; Fg denotes the set of statistical information of the moving object in an entire cycle; and g1, g2, . . .represent the global characteristic information of the moving object.Spatial mapping: The fusion of GIS and moving objects should be executed by determining the sight region of the camera (Figure 2a).Subsequently, the position of each moving object in every video frame image is located in the geographic space (Figure 2b).Moving object storage: After the foreground region In is obtained from different video frames, the attributes of the moving object, such as moving speed, position, motion trajectory, and acceleration, can be obtained by video moving object tracking [13].Subsequently, the corresponding storage model is constructed to record the moving object information.The general representation of the moving object storage model is shown as follows: where O denotes the set of all information of a moving object; C denotes the set of position information of the moving object in each frame; Fc denotes the sub-image of each frame of the moving object, the relevant attributes, and other collected data; f1i, f2i, … denote the moving objects in each frame with different characteristics; Fg denotes the set of statistical information of the moving object in an entire cycle; and g1, g2, … represent the global characteristic information of the moving object.
Spatial mapping: The fusion of GIS and moving objects should be executed by determining the sight region of the camera (Figure 2a).Subsequently, the position of each moving object in every video frame image is located in the geographic space (Figure 2b).The geospatial video mapping equations are established using the homography matrix method.The relationship between the geospatial coordinate system and the image space coordinate system is shown in Figure 3.The center of the station is denoted by C, the image space coordinate system is denoted by Oi Xi Yi, and the geospatial coordinate system is denoted by Og Xg Yg Zg.The geospatial video mapping equations are established using the homography matrix method.The relationship between the geospatial coordinate system and the image space coordinate system is shown in Figure 3.The center of the station is denoted by C, the image space coordinate system is denoted by O i X i Y i , and the geospatial coordinate system is denoted by O g X g Y g Z g .The geospatial video mapping equations are established using the homography matrix method.The relationship between the geospatial coordinate system and the image space coordinate system is shown in Figure 3.The center of the station is denoted by C, the image space coordinate system is denoted by Oi Xi Yi, and the geospatial coordinate system is denoted by Og Xg Yg Zg.Assuming q is a point in the image spatial coordinate system, Q is a point in geographic coordinate system, and they are a pair of points with the same name: Let the homography matrix be M such that the relationship between q and Q is: M is represented as follows: M has six unknowns; thus, at least three pairs of image points and geospatial points should be determined to solve M. When M is determined, the coordinates of any point in the geographic space can be solved: Representation: The geospatial position of each sub-image of every moving object is obtained by spatial mapping.The sub-images are then fused and displayed in its geospatial position in the virtual scene according to its corresponding geospatial location, as shown in Figure 4. Assuming q is a point in the image spatial coordinate system, Q is a point in geographic coordinate system, and they are a pair of points with the same name: Let the homography matrix be M such that the relationship between q and Q is: M is represented as follows: M has six unknowns; thus, at least three pairs of image points and geospatial points should be determined to solve M. When M is determined, the coordinates of any point in the geographic space can be solved: Representation: The geospatial position of each sub-image of every moving object is obtained by spatial mapping.The sub-images are then fused and displayed in its geospatial position in the virtual scene according to its corresponding geospatial location, as shown in Figure 4.

Fusion between Surveillance Video and Geographic Scene
The fusion patterns for surveillance video and geographic scene are divided into two categories, namely, image projection pattern and object projection pattern (Figure 5).Studies on image projection patterns include the studies of Roth [26] and Chen [27].In the current study, video

Fusion between Surveillance Video and Geographic Scene
The fusion patterns for surveillance video and geographic scene are divided into two categories, namely, image projection pattern and object projection pattern (Figure 5).Studies on image projection patterns include the studies of Roth [26] and Chen [27].In the current study, video images are projected as a texture map onto the surface of the geographic scene model, and the video images are represented in the virtual scene.In object projection patterns, the foreground information and background information of the video are extracted and represented in the virtual scene.According to the differences in the projected information, object projection patterns are divided into three categories: foreground and background independent projection [28][29][30], foreground projection [31], and abstract of foreground projection [32][33][34].In foreground and background independent projection, the sub-images of the moving objects are projected onto the corresponding spatial-temporal position in the virtual scene, and the video background is projected as the texture map onto the scene model surface.After the separation of the background, the sub-images of the foreground object are projected onto the corresponding spatial-temporal location in the geographic scene, whereas the video background images are projected as a texture map onto the geographic scene model surface to achieve fused representation.Foreground projection only projects the sub-images of the moving foreground objects onto the corresponding spatial-temporal position in the scene and omits the projection of the video background images.In abstract of foreground projection, the sub-images of the moving foreground objects are replaced with semantic icons, and then these icons are projected onto the corresponding spatial-temporal position in the geographic scene.Table 1 shows a comparison of the visualization capabilities of the different fusion patterns.projection, the sub-images of the moving foreground objects are replaced with semantic icons, and then these icons are projected onto the corresponding spatial-temporal position in the geographic scene.Table 1 shows a comparison of the visualization capabilities of the different fusion patterns.As shown in Table 1, the image projection pattern can satisfy the demand for representing virtual information in a range view in the virtual scene and partially reflect the video image space information.However, owing to the lack of an intelligent analysis of the dynamic video information and the differences in the projection between the background and moving foreground objects in the 3D scene, some projection errors are induced in this method.For example, moving objects are projected as a background portion onto the floor or wall.Both the foreground and background independent projection and foreground projection patterns can satisfy the demand for representing virtual information in a range view in the virtual scene and can reflect the spatial information of the moving objects to a certain extent.In contrast, the abstract of foreground projection can fully support any virtual viewpoint in the virtual scene, allowing for browsing of the moving objects.As shown in Table 1, the image projection pattern can satisfy the demand for representing virtual information in a range view in the virtual scene and partially reflect the video image space information.However, owing to the lack of an intelligent analysis of the dynamic video information and the differences in the projection between the background and moving foreground objects in the 3D scene, some projection errors are induced in this method.For example, moving objects are projected as a background portion onto the floor or wall.Both the foreground and background independent projection and foreground projection patterns can satisfy the demand for representing virtual information in a range view in the virtual scene and can reflect the spatial information of the moving objects to a certain extent.In contrast, the abstract of foreground projection can fully support any virtual viewpoint in the virtual scene, allowing for browsing of the moving objects.However, this pattern completely abandons the representation of the original video image data in the fusion representation process.Furthermore, the loss of visualization content is more substantial.

Architecture of GIS-MOV Surveillance System
On the basis of the integration model of GIS and moving objects, as well as the moving object extraction and spatialization method, a prototype called GIS and moving objects (GIS-MOV) system is implemented.This system can store information from GIS, video images, and moving objects independently, display them, and analyze them integrally.This system can assist supervisors in understanding the geospatial and video contents quickly and effectively.

Design Schematic of the System
The overall system design follows the framework of service-oriented software architecture.The framework of the system is divided into the Function layer, Data layer, Service layer, Business layer and representation layer from the bottom up, as shown in Figure 6.(1) Function layer: The function layer is a server with data processing and analysis functions.It is used for pre-processing GIS and video data.This layer has functional modules for video data acquisition, detection and tracking of moving object trajectories, and geospatial mapping of video data.The real-time video data processing, such as the extraction of moving objects and the spatialization of moving object trajectories, is executed in this layer.Thus, the function (1) Function layer: The function layer is a server with data processing and analysis functions.It is used for pre-processing GIS and video data.This layer has functional modules for video data acquisition, detection and tracking of moving object trajectories, and geospatial mapping of video data.The real-time video data processing, such as the extraction of moving objects and the spatialization of moving object trajectories, is executed in this layer.Thus, the function layer can provide the basic data support for real-time publishing.(2) Data layer: Supported by the database, the data layer is mainly used for storing; accessing: managing geospatial data, video image data, and video moving object data; and providing data services to clients.(3) Service layer: The service layer publishes the data service of the underlying database of the system, including video stream image data service, video moving object data service, and geospatial information data service.This layer provides real-time multi-source data services to terminal users and remote command centers.( 4) Business layer: The business layer selects the relevant data service content according to the demand of the monitoring system user.Through analysis, it fetches different services and generates and transmits the corresponding result to the representation layer.( 5) Representation layer: In the representation layer, users can obtain the multi-pattern fusion representation of the GIS-moving object and the visualization output of the related application analysis function in the common operating system platform.

Design of System Functions
This section describes the modules in the function layer and their functional support relationships, as shown in Figure 7. scene, virtual point of view, position of the surveillance camera, and sight of video image.The virtual scene generation module analyzes the virtual scene of the virtual point and the relative position between camera and video sight.It also judges the accessibility of the virtual viewpoint and the camera sight.In other words, this module builds the foundation on realizing fusion representation; many applications based on GIS-MOV system are carried out under the condition of establishing this module.Section 5 will discuss these applications in detail.( 4) Moving object spatial-temporal analysis module: To achieve some specific applications, this module conducts a synthesis analysis of the related information of the video moving objects and the geographic scene.It also obtains the necessary result to be outputted in the representation module.( 5) Fusion representation module: This module is used to select the fusion pattern between the moving objects and the virtual geographic scene.It performs visual loading on video images, moving object trajectory, sub-images, avatars, and spatial-temporal analysis results.

Operation Flow of the System
The server system adopts a plugin design, that is, it can load and unload plugins with different functions by using a unified access interface.The workflow of the system is shown in Figure 8.On the basis of the unified access interface of the system platform, plugins for virtual scene generation, moving object extraction, video spatialization, spatial-temporal analysis function, and virtual and real fusion representation are designed.The operation flow of the system is as follows: (1) virtual scene generation;

Applications Based on the GIS-MOV System
In this section, we briefly introduce some applications based on the GIS-MOV system and  scene, virtual point of view, position of the surveillance camera, and sight of video image.The virtual scene generation module analyzes the virtual scene of the virtual point and the relative position between camera and video sight.It also judges the accessibility of the virtual viewpoint and the camera sight.In other words, this module builds the foundation on realizing fusion representation; many applications based on GIS-MOV system are carried out under the condition of establishing this module.Section 5 will discuss these applications in detail.( 4) Moving object spatial-temporal analysis module: To achieve some specific applications, this module conducts a synthesis analysis of the related information of the video moving objects and the geographic scene.It also obtains the necessary result to be outputted in the representation module.(5) Fusion representation module: This module is used to select the fusion pattern between the moving objects and the virtual geographic scene.It performs visual loading on video images, moving object trajectory, sub-images, avatars, and spatial-temporal analysis results.

Operation Flow of the System
The server system adopts a plugin design, that is, it can load and unload plugins with different functions by using a unified access interface.The workflow of the system is shown in Figure 8.On the basis of the unified access interface of the system platform, plugins for virtual scene generation, moving object extraction, video spatialization, spatial-temporal analysis function, and virtual and real fusion representation are designed.The operation flow of the system is as follows: (1) virtual scene generation; moving objects and the virtual geographic scene.It performs visual loading on video images, moving object trajectory, sub-images, avatars, and spatial-temporal analysis results.

Operation Flow of the System
The server system adopts a plugin design, that is, it can load and unload plugins with different functions by using a unified access interface.The workflow of the system is shown in Figure 8.On the basis of the unified access interface of the system platform, plugins for virtual scene generation, moving object extraction, video spatialization, spatial-temporal analysis function, and virtual and real fusion representation are designed.The operation flow of the system is as follows: (1) virtual scene generation;

Applications Based on the GIS-MOV System
In this section, we briefly introduce some applications based on the GIS-MOV system and evaluate the advantages of the applications compared with the traditional surveillance video system.

Moving object extraction
Moving object detection GIS-MOV surveillance system

Applications Based on the GIS-MOV System
In this section, we briefly introduce some applications based on the GIS-MOV system and evaluate the advantages of the applications compared with the traditional surveillance video system.

Multiple Fusion Patterns for the Fusion of Moving Objects and Geographic Scene
In the traditional video surveillance monitoring system, the video information is represented as original sequence images, which cannot adequately depict the spatial information associated with the video.Furthermore, the fusion pattern used to map the entire video image to the virtual geographic scene cannot highlight the moving object, and this pattern is unfavorable for the retrieval of interesting video information.Compared with the traditional video surveillance interface, the GIS-MOV system changes the traditional camera-centric video representation approach and achieves the geospatial display of moving objects, along with the fusion of the image space and geospatial information.Furthermore, the proposed system can create a multi-pattern visual representation of the 3D geo-scene and moving object by constructing an independent display channel.The information representation effects of each fusion pattern are described as follows: Data access form No. 1: Trajectory + Sub-images + Background images + 3D virtual scene model (Figure 9).This data access form corresponds to the foreground and background independent projection pattern.In this form, the video images are transformed into the trajectories and sub-images of the objects as well as background images.The sub-images are then used as the attribute data of the trajectories of the objects, and the background images are mapped to the virtual scene model as the attribute data of the camera.
Data access form No. 1: Trajectory + Sub-images + Background images + 3D virtual scene model (Figure 9).This data access form corresponds to the foreground and background independent projection pattern.In this form, the video images are transformed into the trajectories and sub-images of the objects as well as background images.The sub-images are then used as the attribute data of the trajectories of the objects, and the background images are mapped to the virtual scene model as the attribute data of the camera.Data access form No. 2: Trajectory + Sub-images + 3D virtual scene model (Figure 10).This data access form, which corresponds to the foreground projection pattern, only maps the sub-image as the attribute data of the trajectories in the virtual scene and omits mapping of the video background.The visualization result of this form is shown in Figure 11.Data access form No. 3: Trajectory + Semantic symbols + 3D virtual scene model (Figure 12): This data access form, which corresponds to the foreground abstract fusion representation pattern, maps predefined semantic symbols as the attribute data of the trajectories in a virtual scene.The visualization result of this form is shown in Figure 13.Data access form No. 2: Trajectory + Sub-images + 3D virtual scene model (Figure 10).This data access form, which corresponds to the foreground projection pattern, only maps the sub-image as the attribute data of the trajectories in the virtual scene and omits mapping of the video background.The visualization result of this form is shown in Figure 11.
Data access form No. 1: Trajectory + Sub-images + Background images + 3D virtual scene model (Figure 9).This data access form corresponds to the foreground and background independent projection pattern.In this form, the video images are transformed into the trajectories and sub-images of the objects as well as background images.The sub-images are then used as the attribute data of the trajectories of the objects, and the background images are mapped to the virtual scene model as the attribute data of the camera.

Data access form No. 2:
Trajectory + Sub-images + 3D virtual scene model (Figure 10).This data access form, which corresponds to the foreground projection pattern, only maps the sub-image as the attribute data of the trajectories in the virtual scene and omits mapping of the video background.The visualization result of this form is shown in Figure 11.Data access form No. 3: Trajectory + Semantic symbols + 3D virtual scene model (Figure 12): This data access form, which corresponds to the foreground abstract fusion representation pattern, maps predefined semantic symbols as the attribute data of the trajectories in a virtual scene.The visualization result of this form is shown in Figure 13.visual representation of the 3D geo-scene and moving object by constructing an independent display channel.The information representation effects of each fusion pattern are described as follows: Data access form No. 1: Trajectory + Sub-images + Background images + 3D virtual scene model (Figure 9).This data access form corresponds to the foreground and background independent projection pattern.In this form, the video images are transformed into the trajectories and sub-images of the objects as well as background images.The sub-images are then used as the attribute data of the trajectories of the objects, and the background images are mapped to the virtual scene model as the attribute data of the camera.

Data access form No. 2:
Trajectory + Sub-images + 3D virtual scene model (Figure 10).This data access form, which corresponds to the foreground projection pattern, only maps the sub-image as the attribute data of the trajectories in the virtual scene and omits mapping of the video background.The visualization result of this form is shown in Figure 11.Data access form No. 3: Trajectory + Semantic symbols + 3D virtual scene model (Figure 12): This data access form, which corresponds to the foreground abstract fusion representation pattern, maps predefined semantic symbols as the attribute data of the trajectories in a virtual scene.The visualization result of this form is shown in Figure 13.Data access form No. 3: Trajectory + Semantic symbols + 3D virtual scene model (Figure 12): This data access form, which corresponds to the foreground abstract fusion representation pattern, maps predefined semantic symbols as the attribute data of the trajectories in a virtual scene.The visualization result of this form is shown in Figure 13.Compared with the original video image (Figure 14), the fusion representation between the 3D geographic scene and moving object may still be improved.The foreground projection pattern maps the video foreground to the geographic scene and loses the video background information in the visualization.The absence of the foreground projection pattern maps the predefined semantic symbols to the virtual scene effectively represents the information of the trajectory of the moving object.However, it loses the image texture information of the moving object in the visualization.Relevant image information needs to be reviewed in the original video for temporal positioning.Compared with the original video image (Figure 14), the fusion representation between the 3D geographic scene and moving object may still be improved.The foreground projection pattern maps the video foreground to the geographic scene and loses the video background information in the visualization.The absence of the foreground projection pattern maps the predefined semantic symbols to the virtual scene effectively represents the information of the trajectory of the moving object.However, it loses the image texture information of the moving object in the visualization.Relevant image information needs to be reviewed in the original video for temporal positioning.

Video Compression Storage
In object projection fusion, the dynamic video information is stored as different kinds of data.While video data compression occurs in the process of fusion between real and virtual information [35].This type of video compression converts video information from the image level to the object level.The hierarchical relationship diagram of the data compression is presented in Figure 15.Compared with the original video image (Figure 14), the fusion representation between the 3D geographic scene and moving object may still be improved.The foreground projection pattern maps the video foreground to the geographic scene and loses the video background information in the visualization.The absence of the foreground projection pattern maps the predefined semantic symbols to the virtual scene effectively represents the information of the trajectory of the moving object.However, it loses the image texture information of the moving object in the visualization.Relevant image information needs to be reviewed in the original video for temporal positioning.

Video Compression Storage
In object projection fusion, the dynamic video information is stored as different kinds of data.While video data compression occurs in the process of fusion between real and virtual information [35].This type of video compression converts video information from the image level to the object level.The hierarchical relationship diagram of the data compression is presented in Figure 15.

Video Compression Storage
In object projection fusion, the dynamic video information is stored as different kinds of data.While video data compression occurs in the process of fusion between real and virtual information [35].This type of video compression converts video information from the image level to the object level.The hierarchical relationship diagram of the data compression is presented in Figure 15.
On the basis of the data compression mechanism, video image compression is achieved by constructing predictive models, which can predict video image pixels via intra-frame or inter-frame prediction.The models are mainly constructed in accordance with the H.264 standard.The purpose of video image compression is to reconstruct the original video with compressed data.Thus, the capability of recovering the original video images should be taken into account.The purpose of video compression in the fusion of real and virtual information is to represent video information in simplified approaches, such as showing only the sub-images or avatars of the moving object.In the fusion of an representation with a virtual scene, the capability of recovering the original video images need not be considered.In terms of the data compression effect, the video data used in the object projection fusion pattern has data compression relations with the original video sequence images.Furthermore, data compression relations exist between the three patterns of object projection fusion.On the basis of the data compression mechanism, video image compression is achieved by constructing predictive models, which can predict video image pixels via intra-frame or inter-frame prediction.The models are mainly constructed in accordance with the H.264 standard.The purpose of video image compression is to reconstruct the original video with compressed data.Thus, the capability of recovering the original video images should be taken into account.The purpose of video compression in the fusion of real and virtual information is to represent video information in simplified approaches, such as showing only the sub-images or avatars of the moving object.In the fusion of an representation with a virtual scene, the capability of recovering the original video images need not be considered.In terms of the data compression effect, the video data used in the object projection fusion pattern has data compression relations with the original video sequence images.Furthermore, data compression relations exist between the three patterns of object projection fusion.
First layer of compression: In this layer, the compressed data are oriented to the foreground and background independent projection pattern.The sub-images of the moving objects, spatialtemporal position, and background image are extracted and stored separately.This compression layer converts video information from the image level to the object level.
Second layer of compression: In this layer, the compressed data are oriented to the foreground projection pattern, and the virtual scene model is used instead of the video background.This compression layer transfers the background representing the camera view from the image to the virtual scene model.

Third layer of compression:
In this layer, the compressed data are oriented to the abstract of foreground projection pattern.The virtual avatar in semantic symbol is used instead of the sub-images of the moving foreground object to display dynamic video information in a virtual geographic scene.In the third layer of compression, spatial-temporal position is the only information that needs to be obtained from the original video.
To test the compression efficiency of the data for storage, we examined a set of video images and recorded the trend of the compression rate Kl, with respect to the number of input video frames for the different layers.The experimental results are as follows:

First layer of compression:
In this layer, the compressed data are oriented to the foreground and background independent projection pattern.The sub-images of the moving objects, spatial-temporal position, and background image are extracted and stored separately.This compression layer converts video information from the image level to the object level.
Second layer of compression: In this layer, the compressed data are oriented to the foreground projection pattern, and the virtual scene model is used instead of the video background.This compression layer transfers the background representing the camera view from the image to the virtual scene model.

Third layer of compression:
In this layer, the compressed data are oriented to the abstract of foreground projection pattern.The virtual avatar in semantic symbol is used instead of the sub-images of the moving foreground object to display dynamic video information in a virtual geographic scene.In the third layer of compression, spatial-temporal position is the only information that needs to be obtained from the original video.
To test the compression efficiency of the data for storage, we examined a set of video images and recorded the trend of the compression rate K l , with respect to the number of input video frames for the different layers.The experimental results are as follows: In Figure 16, the magnitudes of compression in the first and second layers, i.e., K 1 and K 2 , are in the order of 10 −3 , whereas the magnitude of compression in the third layer, K 3 , is in the order of 10 −5 .These results prove that the video compression based on the integration of GIS and moving objects can effectively reduce the amount of video data.In Figure 16, the magnitudes of compression in the first and second layers, i.e., K1 and K2, are in the order of 10 −3 , whereas the magnitude of compression in the third layer, K3, is in the order of 10 −5 .These results prove that the video compression based on the integration of GIS and moving objects can effectively reduce the amount of video data.

Clustering and Cluster Modeling of Moving Objects in the Geographic Space
The number of moving objects is considerable, and the spatial distribution of the trajectories of the moving objects is random.As a result, a manual statistical analysis of the moving objects is difficult.Thus, trajectory clustering is used to effectively analyze the moving object trajectory and perform data mining.After the constraint conditions have been defined, similarity measurement and a clustering algorithm are used to classify the trajectory as a specific similarity feature.In this process, the dimension of the trajectory information is reduced, thereby facilitating statistical analysis.Modeling the trajectory clusters allows for the visualization of the trajectory information.Furthermore, the geospatial distribution of the trajectory clusters can be easily recognized by the users of the monitoring system.
However, a problem exists in the current clustering and trajectory cluster modeling: the spatial-temporal future in the geographic space of trajectory class cannot be represented.To solve these problems, we use the GIS-MOV system to cluster the moving objects on the basis of the geoscene constraints.Using the spatialization results as reference (Figure 17), we introduce the trajectory cluster modeling into geospatial processing and select the corresponding clustering algorithm to realize trajectory clustering.Finally, geospatial trajectory class modeling is realized by extracting the boundaries of the trajectory class and performing polynomial fitting.

Clustering and Cluster Modeling of Moving Objects in the Geographic Space
The number of moving objects is considerable, and the spatial distribution of the trajectories of the moving objects is random.As a result, a manual statistical analysis of the moving objects is difficult.Thus, trajectory clustering is used to effectively analyze the moving object trajectory and perform data mining.After the constraint conditions have been defined, similarity measurement and a clustering algorithm are used to classify the trajectory as a specific similarity feature.In this process, the dimension of the trajectory information is reduced, thereby facilitating statistical analysis.Modeling the trajectory clusters allows for the visualization of the trajectory information.Furthermore, the geospatial distribution of the trajectory clusters can be easily recognized by the users of the monitoring system.
However, a problem exists in the current clustering and trajectory cluster modeling: the spatial-temporal future in the geographic space of trajectory class cannot be represented.To solve these problems, we use the GIS-MOV system to cluster the moving objects on the basis of the geoscene constraints.Using the spatialization results as reference (Figure 17), we introduce the trajectory cluster modeling into geospatial processing and select the corresponding clustering algorithm to realize trajectory clustering.Finally, geospatial trajectory class modeling is realized by extracting the boundaries of the trajectory class and performing polynomial fitting.In Figure 16, the magnitudes of compression in the first and second layers, i.e., K1 and K2, are in the order of 10 −3 , whereas the magnitude of compression in the third layer, K3, is in the order of 10 −5 .These results prove that the video compression based on the integration of GIS and moving objects can effectively reduce the amount of video data.

Clustering and Cluster Modeling of Moving Objects in the Geographic Space
The number of moving objects is considerable, and the spatial distribution of the trajectories of the moving objects is random.As a result, a manual statistical analysis of the moving objects is difficult.Thus, trajectory clustering is used to effectively analyze the moving object trajectory and perform data mining.After the constraint conditions have been defined, similarity measurement and a clustering algorithm are used to classify the trajectory as a specific similarity feature.In this process, the dimension of the trajectory information is reduced, thereby facilitating statistical analysis.Modeling the trajectory clusters allows for the visualization of the trajectory information.Furthermore, the geospatial distribution of the trajectory clusters can be easily recognized by the users of the monitoring system.
However, a problem exists in the current clustering and trajectory cluster modeling: the spatial-temporal future in the geographic space of trajectory class cannot be represented.To solve these problems, we use the GIS-MOV system to cluster the moving objects on the basis of the geoscene constraints.Using the spatialization results as reference (Figure 17), we introduce the trajectory cluster modeling into geospatial processing and select the corresponding clustering algorithm to realize trajectory clustering.Finally, geospatial trajectory class modeling is realized by extracting the boundaries of the trajectory class and performing polynomial fitting.Figures 18 and 19 show the results of the clustering method and trajectory cluster modeling of the moving objects, respectively.In Figure 18, the trajectories with different geographic characteristics are effectively differentiated.In Figure 19, the results of the trajectory modeling can effectively represent the geospatial features of trajectory class, such as the direction of motion and the spatial distribution of the trajectories.These features can provide users with a clear picture of the general dynamic trends of the moving objects.
Figures 18 and 19 show the results of the clustering method and trajectory cluster modeling of the moving objects, respectively.In Figure 18, the trajectories with different geographic characteristics are effectively differentiated.In Figure 19, the results of the trajectory modeling can effectively represent the geospatial features of trajectory class, such as the direction of motion and the spatial distribution of the trajectories.These features can provide users with a clear picture of the general dynamic trends of the moving objects.

Video Synopsis on Geographic Scene
Surveillance videos captured by camera contain huge amounts of data.However, valuable information, such as moving objects, is distributed sparsely, and the rest is redundant information.Numerous studies have been conducted on video summarization technology to extract useful information from the massive and complex video data and allow for quick browsing [36,37].Video summarization can be classified as image-level summarization and object-level summarization [38].In image-level video summarization [39,40], a summary is constructed by reordering the original video key frames.In the object-level video summarization [41], which is also known as video synopsis, a summary is constructed by extracting foreground dynamic information and background static information of the original video, recombining them to assemble a new sequence of images, and finally reorganizing the video summary.
The current method of creating a video synopsis is to reconstruct the video images.However, this method cannot generate the corresponding representation featuring the geographic environment and moving objects.The GIS-MOV system can solve this problem by constructing the video synopsis and presenting a video synopsis on the geographic scene.This method is based on the spatialization and trajectory clustering results.First, the background in the virtual scene is selected (Figure 20), Then, the method described in Section 5.3 is used to obtain the trajectory clusters of the moving objects.Finally, the pattern of trajectory class + virtual geographic scene is used to generate the video synopsis.Figures 18 and 19 show the results of the clustering method and trajectory cluster modeling of the moving objects, respectively.In Figure 18, the trajectories with different geographic characteristics are effectively differentiated.In Figure 19, the results of the trajectory modeling effectively represent the geospatial features of trajectory class, such as the direction of motion and the spatial distribution of the trajectories.These features can provide users with a clear picture of the general dynamic trends of the moving objects.

Video Synopsis on Geographic Scene
Surveillance videos captured by camera contain huge amounts of data.However, valuable information, such as moving objects, is distributed sparsely, and the rest is redundant information.Numerous studies have been conducted on video summarization technology to extract useful information from the massive and complex video data and allow for quick browsing [36,37].Video summarization can be classified as image-level summarization and object-level summarization [38].In image-level video summarization [39,40], a summary is constructed by reordering the original video key frames.In the object-level video summarization [41], which is also known as video synopsis, a summary is constructed by extracting the foreground dynamic information and background static information of the original video, recombining them to assemble a new sequence of images, and finally reorganizing the video summary.
The current method of creating a video synopsis is to reconstruct the video images.However, this method cannot generate the corresponding representation featuring the geographic environment and moving objects.The GIS-MOV system can solve this problem by constructing the video synopsis and presenting a video synopsis on the geographic scene.This method is based on the spatialization and trajectory clustering results.First, the background in the virtual scene is selected (Figure 20), Then, the method described in Section 5.3 is used to obtain the trajectory clusters of the moving objects.Finally, the pattern of trajectory class + virtual geographic scene is used to generate the video synopsis.

Video Synopsis on Geographic Scene
Surveillance videos captured by camera contain huge amounts of data.However, valuable information, such as moving objects, is distributed sparsely, and the rest is redundant information.Numerous studies have been conducted on video summarization technology to extract useful information from the massive and complex video data and allow for quick browsing [36,37].Video summarization can be classified as image-level summarization and object-level summarization [38].In image-level video summarization [39,40], a summary is constructed by reordering the original video key frames.In the object-level video summarization [41], which is also known as video synopsis, a summary is constructed by extracting the foreground dynamic information and background static information of the original video, recombining them to assemble a new sequence of images, and finally reorganizing the video summary.
The current method of creating a video synopsis is to reconstruct the video images.However, this method cannot generate the corresponding representation featuring the geographic environment and moving objects.The GIS-MOV system can solve this problem by constructing the video synopsis and presenting a video synopsis on the geographic scene.This method is based on the spatialization and trajectory clustering results.First, the background in the virtual scene is selected (Figure 20), Then, the method described in Section 5.3 is used to obtain the trajectory clusters of the moving objects.Finally, the pattern of trajectory class + virtual geographic scene is used to generate the video synopsis.The experimental results in Figure 21 show that, compared with the video synopsis in the image space, the video synopsis on the geographic scene has the several advantages.First, the spatial-temporal structure in the geographic space exists between different moving objects.Second, rapid browsing of the moving objects is enabled with the simulated geospatial behavior.Third, different trajectory clusters can be represented synchronously.These advantages may be ascribed to the following reasons: first, the moving objects can be efficiently represented in the geographic virtual scene after extracting and georeferencing.Second, in the geographic scene, the video background is replaced by the virtual geographic scene for representation, thereby avoiding the problem of having to update the video background constantly.Finally, the sub-images of the moving object are represented by the trajectory cluster model; as a result, the temporal structure of the moving objects is effectively preserved.

Conclusions
The objective of this paper is to integrate GIS and moving objects.This integration can assist users in understanding a video by associating the moving objects with the geospatial information, enhance the browsing efficiency of video information, and reduce the redundancy in video data transmission.For the integration process, the extraction and geo-spatialization of moving objects are necessary.The proposed integration method presents a significant improvement compared with the video-augmented GIS method.Compared with the previous integration of GIS and surveillance video, the proposed integration method can represent moving objects in the virtual geographic scene by different patterns to provide users with a clear understanding of the dynamic video information in the geographic space.The fusion models for GIS and moving objects are established by mapping moving objects to the virtual scene.The relevant fusion models are used as basis for the construction of the prototype of the proposed GIS-MOV system.The system can generate a virtual geographic scene, extract moving objects, and generate a fusion representation.The main contributions of this paper are as follows: (1) defining the concept of GIS and moving object integration and providing different patterns to achieve this; (2) establishing a prototype of the GIS-MOV system, which is an open and extensible system; (3) describing the applications of the GIS-MOV system and analyzing the results of its implementation.The experimental results in Figure 21 show that, compared with the video synopsis in the image space, the video synopsis on the geographic scene has the several advantages.First, the spatial-temporal structure in the geographic space exists between different moving objects.Second, rapid browsing of the moving objects is enabled with the simulated geospatial behavior.Third, different trajectory clusters can be represented synchronously.These advantages may be ascribed to the following reasons: first, the moving objects can be efficiently represented in the geographic virtual scene after extracting and georeferencing.Second, in the geographic scene, the video background is replaced by the virtual geographic scene for representation, thereby avoiding the problem of having to update the video background constantly.Finally, the sub-images of the moving object are represented by the trajectory cluster model; as a result, the temporal structure of the moving objects is effectively preserved.The experimental results in Figure 21 show that, compared with the video synopsis in the image space, the video synopsis on the geographic scene has the several advantages.First, the spatial-temporal structure in the geographic space exists between different moving objects.Second, rapid browsing of the moving objects is enabled with the simulated geospatial behavior.Third, different trajectory clusters can be represented synchronously.These advantages may be ascribed to the following reasons: first, the moving objects can be efficiently represented in the geographic virtual scene after extracting and georeferencing.Second, in the geographic scene, the video background is replaced by the virtual geographic scene for representation, thereby avoiding the problem of having to update the video background constantly.Finally, the sub-images of the moving object are represented by the trajectory cluster model; as a result, the temporal structure of the moving objects is effectively preserved.

Conclusions
The objective of this paper is to integrate GIS and moving objects.This integration can assist users in understanding a video by associating the moving objects with the geospatial information, enhance the browsing efficiency of video information, and reduce the redundancy in video data transmission.For the integration process, the extraction and geo-spatialization of moving objects are necessary.The proposed integration method presents a significant improvement compared with the video-augmented GIS method.Compared with the previous integration of GIS and surveillance video, the proposed integration method can represent moving objects in the virtual geographic scene by different patterns to provide users with a clear understanding of the dynamic video information in the geographic space.The fusion models for GIS and moving objects are established by mapping moving objects to the virtual scene.The relevant fusion models are used as basis for the construction of the prototype of the proposed GIS-MOV system.The system can generate a virtual geographic scene, extract moving objects, and generate a fusion representation.The main contributions of this paper are as follows: (1) defining the concept of GIS and moving object integration and providing different patterns to achieve this; (2) establishing a prototype of the GIS-MOV system, which is an open and extensible system; (3) describing the applications of the GIS-MOV system and analyzing the results of its implementation.

Conclusions
The objective of this paper is to integrate GIS and moving objects.This integration can assist users in understanding a video by associating the moving objects with the geospatial information, enhance the browsing efficiency of video information, and reduce the redundancy in video data transmission.For the integration process, the extraction and geo-spatialization of moving objects are necessary.The proposed integration method presents a significant improvement compared with the video-augmented GIS method.Compared with the previous integration of GIS and surveillance video, the proposed integration method can represent moving objects in the virtual geographic scene by different patterns to provide users with a clear understanding of the dynamic video information in the geographic space.The fusion models for GIS and moving objects are established by mapping moving objects to the virtual scene.The relevant fusion models are used as basis for the construction of the prototype of the proposed GIS-MOV system.The system can generate a virtual geographic scene, extract moving objects, and generate a fusion representation.The main contributions of this paper are as follows: (1) defining the concept of GIS and moving object integration and providing different patterns to achieve this; (2) establishing a prototype of the GIS-MOV system, which is an open and extensible system; (3) describing the applications of the GIS-MOV system and analyzing the results of its implementation.
After analyzing the integration model and the results of the implementation of the GIS-MOV system, we believe that, compared with the integration of GIS and video image, the integration of GIS and moving objects has the following advantages: (1) providing a video-augmented GIS information representation pattern in which the virtual geographic scene is enhanced by the moving objects; (2) reducing the amount of data required for the fusion of GIS and video; (3) allowing for a flexible selection of video foreground and background represented in GIS; (4) efficiently and intensively representing moving objects in the geographic space; (5) increasing the spatial positioning accuracy of moving objects.However, the integration of GIS and moving objects still has several limitations: (1) video information loss (depending on the fusion patterns, some methods do not include background information, and some do not include the sub-images of the moving objects); (2) inability to represent complex dynamic video information (e.g., video images with a considerable amount of moving people or vehicles).
For the theoretical and practical analysis of the integration of GIS and moving objects, this paper only describes the situation in which video data are acquired by a single camera.Further study should be executed on two main aspects: (1).The integration of GIS and moving objects extracted from multiple cameras in camera-network.(2) The integration of GIS and moving objects extracted from moving cameras.Finally, we consider to having some thorough study on the integration of GIS and moving objects from camera-network with multiple moving cameras.

Figure 1 .
Figure 1.Graph about the key technologies of Integration of GIS and moving objects in video.

Figure 3 .
Figure 3. Camera and geospatial coordinate system, image space coordinate system.

Figure 2 .
Figure 2. (a) The sight region of the camera; (b) The position of each moving object in the geographic space.

Figure 3 .
Figure 3. Camera and geospatial coordinate system, image space coordinate system.Figure 3. Camera and geospatial coordinate system, image space coordinate system.

Figure 3 .
Figure 3. Camera and geospatial coordinate system, image space coordinate system.Figure 3. Camera and geospatial coordinate system, image space coordinate system.

Figure 4 .
Figure 4. Schematic of fusing displaying of moving objects' sub-images in the virtual geographic scene.

Figure 4 .
Figure 4. Schematic of fusing displaying of moving objects' sub-images in the virtual geographic scene.

Figure 5 .
Figure 5. Fusion patterns for surveillance video and geographic scene.

Figure 5 .
Figure 5. Fusion patterns for surveillance video and geographic scene.

17 Figure 6 .
Figure 6.Design Schematic of the system.

Figure 6 .
Figure 6.Design Schematic of the system.

17 ( 1 )
ISPRS Int.J. Geo-Inf.2017, 6, 94 8 of Moving object extraction module: This module uses detection and tracking algorithms to extract moving objects; separate the video foreground and background; and stores the trajectory, type, set of sub-images, and other associated information of the moving objects.(2) Video spatialization module: This module constructs the mapping matrix by selecting the associated image space and geospatial mapping model and calibrates the internal and external parameters of the camera for video spatialization.(3) Virtual scene generation module: This module is mainly used to load the virtual geographic

Figure 8 .
Figure 8. Operation flow of system.

Figure 8 .
Figure 8. Operation flow of system.

Figure 8 .
Figure 8. Operation flow of system.

Figure 11 .
Figure 11.The visualization result of Data access form No. 2.

Figure 11 .
Figure 11.The visualization result of Data access form No. 2.

Figure 11 .
Figure 11.The visualization result of Data access form No. 2.

Figure 11 .
Figure 11.The visualization result of Data access form No. 2.

Figure 13 .
Figure 13.The visualization result of Data access form No. 3.

Figure 13 .
Figure 13.The visualization result of Data access form No. 3.

Figure 13 .
Figure 13.The visualization result of Data access form No. 3.

17 Figure 15 .
Figure 15.Diagram of video compression data hierarchical relationship.

Figure 15 .
Figure 15.Diagram of video compression data hierarchical relationship.

Figure 17 .Figure 16 .
Figure 17.Trajectories of moving objects.(a) Trajectories represented in image space; (b) Trajectories represented in geographic space.

Figure 17 .
Figure 17.Trajectories of moving objects.(a) Trajectories represented in image space; (b) Trajectories represented in geographic space.

Figure 17 .
Figure 17.Trajectories of moving objects.(a) Trajectories represented in image space; (b) Trajectories represented in geographic space.

Figure 20 .
Figure 20.Background selection of Video synopsis on geographic scene.

Figure 21 .
Figure 21.Comparison of visual effects between video synopsis on image space and video synopsis on geographic scene.(a) Video synopsis on geographic scene; (b) Video synopsis on image.

Figure 20 .
Figure 20.Background selection of Video synopsis on geographic scene.

17 Figure 20 .
Figure 20.Background selection of Video synopsis on geographic scene.

Figure 21 .
Figure 21.Comparison of visual effects between video synopsis on image space and video synopsis on geographic scene.(a) Video synopsis on geographic scene; (b) Video synopsis on image.

Figure 21 .
Figure 21.Comparison of visual effects between video synopsis on image space and video synopsis on geographic scene.(a) Video synopsis on geographic scene; (b) Video synopsis on image.

Table 1 .
Analysis of the visualization capability of the fusion patterns.

Table 1 .
Analysis of the visualization capability of the fusion patterns.