3D Perspective towards the Development of a Metadata-Driven Sharing Mechanism for Heterogeneous CCTV Systems

: The installation of closed-circuit television monitors (CCTV) has rapidly increased in number ever since the 11 September attacks. With the advantages of direct visual inspection, CCTV systems are widely used on various occasions that require instantaneous and long-term monitoring. Especially for emergency response tasks, the prompt availability of CCTV offers EOC (Emergency Operation Center) commanders much better action reference about the reported incidents. However, the heterogeneity among the CCTV systems impedes the effective and efﬁcient use and sharing of CCTV services hosted by different stakeholders, making individual CCTV systems often operate on their own and restrict the possibility of taking the best advantages of the huge number of existing CCTV systems. This research proposes a metadata-driven approach to facilitate a cross-domain sharing mechanism for heterogeneous CCTV systems. The CCTV metadata includes a set of enriched description information based on the analysis from the aspects of Who, When, Where, What, Why and How (5W1H) for CCTV. Sharing mechanisms based on standardised CCTV metadata can then sufﬁce the need for querying and selecting CCTV across heterogeneous systems according to the task at hand. One distinguished design is the modelling of the ﬁeld of view (FOV) of CCTV from the 3D perspective. By integrating with the 3D feature-based city model data, the 3D FOV information not only provides better visualisation about the spatial coverage of the CCTV systems but also enables the 3D visibility analysis of CCTV based on individual features, such that the selection decision can be further improved with the indexing of CCTV and features. As the number and variety of CCTV systems continuously grows, the proposed mechanism has a great potential to serve as a solid collaborated foundation for integrating heterogeneous CCTV systems for applications that demand comprehensive and instantaneous understanding about the dynamically changing world, e.g., smart cities, disaster management, criminal investigation, etc.


Introduction
Nowadays, CCTV cameras have already penetrated into our daily lives, such that whenever we step out of our own door, our every move may be recorded by CCTV systems we were not aware of. The rapid growth of CCTV systems in recent years can be attributed to the demands on environmental security and the rise of internet technology [1]. CCTV has become a major tool for government agencies, enterprises, and even individuals to protect people's lives and property and maintain social security [2]. According to the London IHS Markit report, the number of global CCTV has reached 770 million by the end of 2020 [3] and is being used in various types of applications. For example, flooding is one of the major concerns during disasters for the Water Resources Administration (WRA), Ministry of Economic Affairs, so many flood sensors are deployed at such facilities as bridges to monitor the water level of the river [4]. When heavy rainfall occurs, a well-planned CCTV system can provide visual monitoring about the disaster-prone area and help to validate if disaster really occurs. Since every CCTV system has limited visual coverage, it would be advantageous to be able to integrate CCTV systems operated by other stakeholders for a better understanding of the actual status in reality. For example, CCTV systems operated by transportation agencies may provide visual information about the flooding that occurred on specific roads or intersections and become a good complementary visual aid to WRA. As for the criminal investigation, it has been a standard operating procedure to acquire and examine all the CCTV recordings in the neighbourhood, regardless of the operators, to rebuild the complete crime scene [5]. This shows that while each CCTV system is deployed for their intended purposes, their uses are expected to be multi-purpose, should an effective cross-discipline sharing mechanism be successfully developed. However, most of the current systems are developed based on their own particular purpose. Take the highway real-time road condition system of Taiwan as an example. (http://1968.freeway. gov.tw/?lang=en, accessed on 24 July 2021), this system provides the CCTV steaming service along the freeway with a map interface and the mileage the CCTV deployed. Users are provided with a complete list of CCTVs to choose from, but what users can do are restricted by the systems. Even if the URL of the service is available, limited information about the CCTV is available. Smart cities 3D Taipei (https://3d.taipei/, accessed on 24 July 2021) provides CCTV streaming services all over the city. Despite it successfully integrating various themes of geographic data from city governments, the use of CCTV information is still limited to map interface and streaming content. Expanding the integrated applications of current CCTV systems will need new thinking.
Although the importance of CCTV systems in disaster management has been well recognised, their potential has not yet been fully maximised due to the lack of an effective coordinated sharing mechanism. When managing the quickly increasing number of reported disaster incidents, commanders must promptly assess the threat and damage, then make timely and correct decisions based on the available information [6]. Visual inspection provides invaluable observations to validate the reported incidents and even help to assess the levels of threats. As the hazards progress, CCTV systems can also serve as an indispensable aid to continuously monitor the changes in reality and issue early warnings to the people living in the neighbourhood. To effectively overcome the heterogeneity among CCTV systems and efficiently take the most advantages of their recordings, it is necessary to consider the aspect of "standardisation" [7]. The development of standards implies establishing a consensus written agreement about the selected topic among the stakeholders from the related domains [8]. Standards are especially important for dealing with the heterogeneity issue and improving the interoperability of systems operated by different stakeholders [9,10]. It is especially helpful for facilitating the sharing of available resources and to provide a better reference for decision making [11]. One of the purposes of a standards-based system is to develop a platform for barrier-free flow and exchangeability of data [12,13]. The International Standards Organization Technical Committee 211 (ISO/TC 211) and OGC (Open GIS Consortium) have been working on the development of GIS-related standards and specifications for almost three decades, and the results of standardisation have become evident [14]. Under the standardisation framework, the industry can design standard-compliant software to support the distribution of required information between different parties [15,16]. For example, information on various types of disasters, such as large-scale disasters or natural disasters, is communicated through a standardised system platform to the agencies responsible for emergency response. [17,18].
The well-known definition of metadata is "data about data" [19,20], meaning another piece of data is created to provide meaningful reference information to existing data so that users can build a correct understanding of the acquired data. The concept of metadata has been widely used for the discovery and use of digital resources on the internet [21]. The content of metadata standards is dependent on the specific domain it intends to describe. In the geospatial domain, the ISO19115 series of standards from ISO/TC211 has been widely adopted by many countries and organisations [22]. ISO19115 includes a number of standardised packages that can describe the various characteristics of the geospatial resource. Metadata created according to the chosen geospatial metadata standard then serves as the foundation for developing a geospatial resource sharing mechanism [23]. From these platforms, users may easily discover geospatial resources they have no prior knowledge of and assess their fitness for use based on the content of metadata without ambiguity [24]. Metadata is therefore considered a necessary component for the successful development of national or regional spatial data infrastructure (SDI) [25]. The use of metadata can be extended to any chosen domain. For example, SensorML [26] from OGC is the standard specifically developed for describing the common characteristics of the sensors used in the SWE-based application environment. The metadata of sensors can provide a useful reference for users to evaluate the use of the obtained observations. With standardised metadata of sensors, we can manage the description information of heterogeneous systems through database management technology to achieve the ultimate goal of transparent resource sharing. This study adopts the 5W1H approach for exploring the design of necessary descriptive information to facilitate the interoperable selection of CCTVs among heterogeneous systems.
A CCTV system may include a number of CCTVs deployed at different places for monitoring the selected objects in reality. While installing an individual CCTV, the designer starts by selecting the specific deployed location, visually inspects the visible areas of the captured image, and continuously adjusts it until the objects of interest are vividly visible (with satisfactory image resolution). Whenever a disaster occurs, the emergency response agencies must quickly find and locate all the available CCTVs to verify if the disaster indeed occurs. If it does occur, then the CCTVs that provide direct visual inspection are added to the monitoring list and continuously operate until the disaster situation is released. As the selection of CCTV is heavily dependent on if the disaster is visible with the CCTV, this paper proposes a 3D approach that combines CCTV 3D FOV with a 3D city model data to improve the decision of CCTV selection. Figure 1 illustrates the conceptual framework of the CCTV resource sharing mechanism envisioned in this study. Standardised metadata enables the creation of comprehensive descriptions for the various properties of heterogeneous CCTV systems, and a CCTV 3D selection strategy is developed based upon metadata databases and indexes for recommending CCTVs that can meet the visual demands of disaster incidents. With such integration, we aim to provide a new approach to remove the obstacles of heterogeneity and lack of 3D consideration of current CCTV systems. ISPRS Int. J. Geo-Inf. 2021, 10, x FOR PEER REVIEW 3 of 25 been widely adopted by many countries and organisations [22]. ISO19115 includes a number of standardised packages that can describe the various characteristics of the geospatial resource. Metadata created according to the chosen geospatial metadata standard then serves as the foundation for developing a geospatial resource sharing mechanism [23]. From these platforms, users may easily discover geospatial resources they have no prior knowledge of and assess their fitness for use based on the content of metadata without ambiguity [24]. Metadata is therefore considered a necessary component for the successful development of national or regional spatial data infrastructure (SDI) [25]. The use of metadata can be extended to any chosen domain. For example, SensorML [26] from OGC is the standard specifically developed for describing the common characteristics of the sensors used in the SWE-based application environment. The metadata of sensors can provide a useful reference for users to evaluate the use of the obtained observations. With standardised metadata of sensors, we can manage the description information of heterogeneous systems through database management technology to achieve the ultimate goal of transparent resource sharing. This study adopts the 5W1H approach for exploring the design of necessary descriptive information to facilitate the interoperable selection of CCTVs among heterogeneous systems.
A CCTV system may include a number of CCTVs deployed at different places for monitoring the selected objects in reality. While installing an individual CCTV, the designer starts by selecting the specific deployed location, visually inspects the visible areas of the captured image, and continuously adjusts it until the objects of interest are vividly visible (with satisfactory image resolution). Whenever a disaster occurs, the emergency response agencies must quickly find and locate all the available CCTVs to verify if the disaster indeed occurs. If it does occur, then the CCTVs that provide direct visual inspection are added to the monitoring list and continuously operate until the disaster situation is released. As the selection of CCTV is heavily dependent on if the disaster is visible with the CCTV, this paper proposes a 3D approach that combines CCTV 3D FOV with a 3D city model data to improve the decision of CCTV selection. Figure 1 illustrates the conceptual framework of the CCTV resource sharing mechanism envisioned in this study. Standardised metadata enables the creation of comprehensive descriptions for the various properties of heterogeneous CCTV systems, and a CCTV 3D selection strategy is developed based upon metadata databases and indexes for recommending CCTVs that can meet the visual demands of disaster incidents. With such integration, we aim to provide a new approach to remove the obstacles of heterogeneity and lack of 3D consideration of current CCTV systems.   The rest of this paper is organised as follows: Section 2 explains the proposed strategy for CCTV selection from the 3D perspective, Section 3 follows the 5W1H viewpoint to analyse the required standardised metadata to facilitate the discovery and sharing of a heterogeneous CCTV service, Section 4 uses four scenarios to demonstrate the applications of the proposed mechanism, and finally, Section 5 concludes the major findings of this study and explores the future directions of research. This paper mainly focuses on the technological issues for 3D FOV modelling and CCTV sharing mechanism; other issues like infrastructure, institution and legal aspects are beyond the scope of this paper and will not be included.

3D Perspective of CCTV Field of View
To facilitate the best usage of existing CCTV systems, an essential requirement is to obtain the FOV information of each CCTV, such that emergency response commanders can promptly assess and select the CCTVs useful for providing information about the disaster that occurred in reality. This section intends to discuss how the "visible" issue is solved by the proposed 3D FOV approach.
Field of view is defined as a spatial extent visible from a specific location [27]. According to whether the image range is fixed or not, two major types of CCTV cameras can be identified. CCTV cameras with a fixed range are widely adopted due to their relatively lower prices, while CCTV with capabilities to adjust the visualised area with zoom and rotation operations can get a better or wider shot about the objects of interest. Based on the deployed location and hardware specification of CCTV, the FOV of CCTV denotes the maximum spatial extent for it to collect visual information. Such information is necessary for determining if an object can be seen from CCTV. In terms of the visible issue, another factor that must be considered is the ability to operate at night. For CCTVs without night vision capability, nothing can be observed even if they are deployed at a perfect place. It hence implies the FOV information is time-dependent and has not remained the same at all times.
To aid the selection of CCTVs across heterogeneous systems, an essential requirement is that every CCTV camera must provide its location and FOV information in a standard way. The simplest way of showing the geographic distribution of CCTVs is to represent CCTV as a point symbol on the 2D reference maps. However, this presentation lacks information about the area covered by the CCTVs, so it can be used only for excluding CCTVs, not in the neighbouring area of the reported incidents. An improved approach is to add a circular sector that emulates the FOV of individual CCTV. Although this approach can help to remove the CCTVs not pointing to the location of the reported incidents, the 2D map ( Figure 2) overlay can merely illustrate the locational relationship between the hypothesised 2D FOV of CCTVs and neighbouring objects without considering many factors that may affect the visibility, e.g., the "deployed height", "pitch angle" and "obstruction of objects". All three factors are important in determining if the objects of interest can be seen [28].
In this research, we argue that the analysis of 3D FOV can better determine the visibility relationship between CCTVs and objects in reality. The standardised recording of such information will enable the rapid and correct selection of CCTVs among heterogeneous systems during disaster response tasks.
As CCTV systems are deployed for continuously monitoring and recording surrounding events for particular purposes, their deployed location must be planned beforehand to ensure the objects of interest are visible from the CCTVs [29]. The major breakthrough is therefore based on the feature-based visibility analysis between CCTVs and real-world objects from the 3D perspective. Three major issues are discussed in the following accordingly. In this research, we argue that the analysis of 3D FOV can better determine the visibility relationship between CCTVs and objects in reality. The standardised recording of such information will enable the rapid and correct selection of CCTVs among heterogeneous systems during disaster response tasks.
As CCTV systems are deployed for continuously monitoring and recording surrounding events for particular purposes, their deployed location must be planned beforehand to ensure the objects of interest are visible from the CCTVs [29]. The major breakthrough is therefore based on the feature-based visibility analysis between CCTVs and real-world objects from the 3D perspective. Three major issues are discussed in the following accordingly.

Parameters of 3D FOV about CCTV Camera
Based on the hardware specification, every CCTV has a specific FOV that limits its maximum visual coverage. A CCTV camera uses a lens sensor to record images or videos of objects in reality. Since the content recorded in image and video format is generally square, the geometric characteristics of FOV can be first modelled as a three-dimensional volume object with four corners, similar to a pyramid. The geometry of this pyramid is mainly determined by the format of the sensor, the focal length of the lens and the angle of view ( Figure 3). Parameters that may affect the FOV include: • Sensor: The parameters for the image sensor format include the height (denoted by h) and the width (denoted by w) for defining the size of the sensor. For example, the diagonal length of Sony IMX482LQJ (https://www.sony-semicon.co.jp/products/common/pdf/IMX482LQJ_Flyer.pdf, accessed on 12 July 2021) is 12.86 mm (type 1/2"), and the aspect ratio is 16:9, so the parameters of w and h can be determined accordingly.

•
Focal length: Focal length is defined as the distance between the sensor and lens (denoted by f). The focal length of the camera determines its capability of magnification. The lens with a shorter focal length will be able to "see" a wider view of a scene,

Parameters of 3D FOV about CCTV Camera
Based on the hardware specification, every CCTV has a specific FOV that limits its maximum visual coverage. A CCTV camera uses a lens sensor to record images or videos of objects in reality. Since the content recorded in image and video format is generally square, the geometric characteristics of FOV can be first modelled as a three-dimensional volume object with four corners, similar to a pyramid. The geometry of this pyramid is mainly determined by the format of the sensor, the focal length of the lens and the angle of view ( Figure 3). Parameters that may affect the FOV include: • Sensor: The parameters for the image sensor format include the height (denoted by h) and the width (denoted by w) for defining the size of the sensor. For example, the diagonal length of Sony IMX482LQJ (https://www.sony-semicon.co.jp/products/ common/pdf/IMX482LQJ_Flyer.pdf, accessed on 12 July 2021) is 12.86 mm (type 1/2"), and the aspect ratio is 16:9, so the parameters of w and h can be determined accordingly.

•
Focal length: Focal length is defined as the distance between the sensor and lens (denoted by f). The focal length of the camera determines its capability of magnification. The lens with a shorter focal length will be able to "see" a wider view of a scene, while the lens with a longer focal length has a narrower view of the scene but a higher level of magnification for a clearer and detailed recording of the observed phenomena. The focal length of the camera lens is determined by the size of the photosensitive element (width or height), the far effective range (D) and the size of the object (h0) to be measured when the value of the focal length changes. • Resolution: Display resolution refers to the number of pixels for each dimension that can be displayed. Every factory model has their own designed resolutions, e.g., 1080P, 480TV lines, 4K. The basic assumption is that the objects of interest must be clearly identifiable through the resolution of the lens (pixels) at maximum magnification.

•
The angle of view: The angle of view α defines the vertical angle from a camera's angle of coverage.

•
The pyramid of vision: A pyramid-shaped 3D volume object represents the maximum visual coverage observable from the CCTV. While we can visually inspect the objects "in front of" the camera, visible objects are "clipped" by the boundary of the pyramid. • Distance: The parameter of far effective range D defines the distance from the lens to the object. Objects located within the distance range of D can be seen vividly from the CCTV. According to [29], h0 is defined as the height of the object (the average height of humans), p is a percentage of the height of a person divided by the height of the screen. Based on the three parameters D, h0 and p, the value of α can be calculated. • Zoom in/out: A zoom operation is a basic camera operation for changing the value of focal length. The "zoom-in" operation enlarges the illustration of objects and provides detailed visual information; the zoom-out operation provides a broader illustration of the neighbour environment. CCTVs with zoom capability allow operators to flexibly adjust the visual presentation of objects within the FOV. • Rotation: The rotation of the CCTV camera determines the maximum horizontal and vertical range of visual coverage.

•
For determining the spatial extent of the pyramid, Altahir et al. [30] propose a methodology based on the parameters of f, w, h, D, α and h0. This enables the determination of the maximum visual coverage of a CCTV, i.e., the 3D FOV, based on its hardware specification and the parameter related to the observed objects, h0 and p.
ena. The focal length of the camera lens is determined by the size of the photosensitive element (width or height), the far effective range (D) and the size of the object (h0) to be measured when the value of the focal length changes. • Resolution: Display resolution refers to the number of pixels for each dimension that can be displayed. Every factory model has their own designed resolutions, e.g., 1080P, 480TV lines, 4K. The basic assumption is that the objects of interest must be clearly identifiable through the resolution of the lens (pixels) at maximum magnification.

•
The angle of view: The angle of view α defines the vertical angle from a camera's angle of coverage.

•
The pyramid of vision: A pyramid-shaped 3D volume object represents the maximum visual coverage observable from the CCTV. While we can visually inspect the objects "in front of" the camera, visible objects are "clipped" by the boundary of the pyramid.

•
Distance: The parameter of far effective range D defines the distance from the lens to the object. Objects located within the distance range of D can be seen vividly from the CCTV. According to [29], h0 is defined as the height of the object (the average height of humans), p is a percentage of the height of a person divided by the height of the screen. Based on the three parameters D, h0 and p, the value of α can be calculated.

•
Zoom in/out: A zoom operation is a basic camera operation for changing the value of focal length. The "zoom-in" operation enlarges the illustration of objects and provides detailed visual information; the zoom-out operation provides a broader illustration of the neighbour environment. CCTVs with zoom capability allow operators to flexibly adjust the visual presentation of objects within the FOV.

3D Visibility Analysis after Considering Obstruction
The 3D FOV representation only presents an ideal scenario for the use of CCTV. In an urban environment, what can be really observed with CCTVs must also consider the neighbouring environment of the deployed location of CCTV. In Figure 4, two points (P and P') located at the same line of sight are projected to the same point in the captured image and may not be able to distinguish it. This implies that for every line of sight, the object closest to the CCTV may obscure the objects behind. Figure 5 illustrates the simplest scenario of 3D FOV analysis. Assuming the CCTV is located at the peak of the pyramid, the area illustrated with the colour green represents the area on the ground that can be observed from the CCTV. Such an area can also be explained as the "intersection" of the 3D FOV and the surface of the ground. After adding a 3D object, the area illustrated by the colour red in Figure 6 indicates the area is not visible due to the addition of the object. Two types of "invisible region" can be identified. The first type is on the objects because only the parts facing towards the CCTV are visible. Although the object is determined to be visible, many parts of the object are actually not visible due to its 3D property. The second type of invisible area is the shading area behind the object, meaning anything located within this area is not visible either. This obstruction effect is, in fact, not restricted to the projection on the ground only. The obstruction should rather be represented by a 3D geometry determined by the location of the CCTV and the objects. The three buildings in Figure 7 further extend the influence of obstruction. Building A is the closest building to the CCTV, followed by buildings B and C. A part of building B is obstructed by building A, while building C is totally invisible due to the existence of building B, despite the fact that both building B and building C are located within the 3D FOV of the CCTV. Since this obstruction effect cannot be explained with 2D data, 3D FOV analysis is hence necessary. The following findings can be summarised.

•
Due to the volume of the objects, only subparts, i.e., the parts facing towards the CCTV, can be observed.

•
The obstruction effect further restricts the objects that can be visually inspected from the CCTV. This information is very useful for CCTV selection in the urban environment.

•
The more complete the objects that are involved, the more complex and fragmented the visible parts are.

•
Unless the location of CCTV changes, the influence of obstruction remains the same even with the zoom and rotation operation. • Only CCTV located at different places may provide visual images for the obstructed regions. It is therefore advantageous to be able to find alternative CCTVs from other CCTV systems. • For determining the spatial extent of the pyramid, Altahir et al. [30] propose a methodology based on the parameters of f, w, h, D, α and h0. This enables the determination of the maximum visual coverage of a CCTV, i.e., the 3D FOV, based on its hardware specification and the parameter related to the observed objects, h0 and p.

3D Visibility Analysis after Considering Obstruction
The 3D FOV representation only presents an ideal scenario for the use of CCTV. In an urban environment, what can be really observed with CCTVs must also consider the neighbouring environment of the deployed location of CCTV. In Figure 4, two points (P and P') located at the same line of sight are projected to the same point in the captured image and may not be able to distinguish it. This implies that for every line of sight, the object closest to the CCTV may obscure the objects behind. Figure 5 illustrates the simplest scenario of 3D FOV analysis. Assuming the CCTV is located at the peak of the pyramid, the area illustrated with the colour green represents the area on the ground that can be observed from the CCTV. Such an area can also be explained as the "intersection" of the 3D FOV and the surface of the ground. After adding a 3D object, the area illustrated by the colour red in Figure 6 indicates the area is not visible due to the addition of the object. Two types of "invisible region" can be identified. The first type is on the objects because only the parts facing towards the CCTV are visible. Although the object is determined to be visible, many parts of the object are actually not visible due to its 3D property. The second type of invisible area is the shading area behind the object, meaning anything located within this area is not visible either. This obstruction effect is, in fact, not restricted to the projection on the ground only. The obstruction should rather be represented by a 3D geometry determined by the location of the CCTV and the objects. The three buildings in Figure 7 further extend the influence of obstruction. Building A is the closest building to the CCTV, followed by buildings B and C. A part of building B is obstructed by building A, while building C is totally invisible due to the existence of building B, despite the fact that both building B and building C are located within the 3D FOV of the CCTV. Since this obstruction effect cannot be explained with 2D data, 3D FOV analysis is hence necessary. The following findings can be summarised.     • Due to the volume of the objects, only subparts, i.e., the parts facing towards the CCTV, can be observed.

•
The obstruction effect further restricts the objects that can be visually inspected from the CCTV. This information is very useful for CCTV selection in the urban environment.

•
The more complete the objects that are involved, the more complex and fragmented the visible parts are.

•
Unless the location of CCTV changes, the influence of obstruction remains the same even with the zoom and rotation operation. • Only CCTV located at different places may provide visual images for the obstructed regions. It is therefore advantageous to be able to find alternative CCTVs from other CCTV systems.
The rapid development of 3D modelling technology has enabled the establishment of 3D city model data in recent years, such that many urban phenomena, especially manmade features like buildings, are available in a 3D fashion. By integrating the 3D featurebased city model data with the 3D FOV of CCTVs, the whole 3D space can be subdivided into two major categories, namely, the space visible and invisible to the chosen CCTVs. This is especially important during CCTV selection because the densely distributed buildings may very likely obstruct many areas that we may naively assume are visible with

•
Due to the volume of the objects, only subparts, i.e., the parts facing towards the CCTV, can be observed.

•
The obstruction effect further restricts the objects that can be visually inspected from the CCTV. This information is very useful for CCTV selection in the urban environment.

•
The more complete the objects that are involved, the more complex and fragmented the visible parts are.

•
Unless the location of CCTV changes, the influence of obstruction remains the same even with the zoom and rotation operation. • Only CCTV located at different places may provide visual images for the obstructed regions. It is therefore advantageous to be able to find alternative CCTVs from other CCTV systems.
The rapid development of 3D modelling technology has enabled the establishment of 3D city model data in recent years, such that many urban phenomena, especially manmade features like buildings, are available in a 3D fashion. By integrating the 3D featurebased city model data with the 3D FOV of CCTVs, the whole 3D space can be subdivided into two major categories, namely, the space visible and invisible to the chosen CCTVs. This is especially important during CCTV selection because the densely distributed buildings may very likely obstruct many areas that we may naively assume are visible with The rapid development of 3D modelling technology has enabled the establishment of 3D city model data in recent years, such that many urban phenomena, especially man-made features like buildings, are available in a 3D fashion. By integrating the 3D feature-based city model data with the 3D FOV of CCTVs, the whole 3D space can be subdivided into two major categories, namely, the space visible and invisible to the chosen CCTVs. This is especially important during CCTV selection because the densely distributed buildings may very likely obstruct many areas that we may naively assume are visible with only 2D FOV data. Sometimes a CCTV close to the reported incidents may not provide any visual clues due to the obstruction, but another CCTV a few blocks away may provide important visual reference. Based on the 3D representation of the city model data and the 3D FOV of CCTVs, the visibility analysis helps commanders to quickly and precisely narrow down the candidate list of CCTVs.
To improve the efficiency of the proposed mechanism, the 3D FOV data must be calculated beforehand and recorded as the metadata of the CCTV (explained in Section 3). The 3D city model data also needs to be collected. The CCTV selection algorithm is based on the concept of "filter and refinement" principle. Upon receiving a reported incident, a filtering operation is executed according to the 2D FOV representation extracted from the 3D FOV information. This aims to quickly exclude CCTVs that cannot provide a visual reference, e.g., far away from or do not point to the location of the reported incident, without involving large volume and complex geometry calculation. If thousands of CCTVs from heterogeneous systems are simultaneously considered, spatial index, e.g., grid or r-trees [31], can be introduced to further improve the filtering efficiency. Only CCTVs that pass the 2D filtering procedure are tested in the refinement step with the 3D FOV and the feature-based city model data. For every reported incident, commanders are presented with a list of candidate CCTVs for visual inspection reference. Commanders can freely inspect 3D-based CCTV FOV and feature-based city model data in the 3D map interface of the GIS-based environment and pinpoint any location as a hypothesised incident to trigger the visibility analysis and acquire the list of the candidate CCTVs.

Visibility Based on the Feature-Based Index
The proposed approach can determine which CCTVs should be triggered for further verification by performing a 3D geometric intersection test on the location of the reported incident, the 3D FOV information of the CCTV system and the 3D feature-based city model data. As some of the reported incidents may involve the use of particular landmarks or buildings, and their visibility has already been determined beforehand, an enhanced query strategy is further proposed on the basis of individual features. According to the visibility scenarios discussed in Section 2.2, a city object is either visible or invisible to the available CCTV systems. Therefore, a feature-based index of visibility analysis can be established by recording the ID of the CCTV and the objects it can observe. Since this index records every pair of CCTVs and their visible objects, it allows retrieving candidate CCTVs that can provide a visual reference by specifying the ID of the object, as well as retrieving objects visible from a particular CCTV. The merits of the proposed index include the quick response with respect to the object (e.g., landmark) commanders are familiar with and avoid on-the-fly geometric calculation with the predetermined information. The limitation is the lack of adaptability for monitoring objects whose locations continuously change. This defect can be solved by specifying the new location of the object as a reported incident.
Prior discussion also indicates that an object may be only partially visible due to the obstruction of other real-world objects. Two feature-based index strategies are further proposed. Since the b-rep of 3D objects can be seen as a composition of the multi-surface objects, the visibility index can be extended to be between the CCTV and the individual composed surface instead of the whole feature. By specifying a unique ID to each composed surface, the index is designed to record the ID of the composed surface of a particular visible pair of objects and CCTV. This improves the searching granularity by removing the case that a building is determined as visible according to the building ID but eventually find out the objects of interest are actually located on the backside of the building. Another index is based on the geometry of the visible area on the b-rep of the objects and terrain. Due to the obstruction, the shape of the visible area may be irregular due to obstruction, but its location is always attached to the surface of the object and terrain. For each visible region, a unique ID associated with the surface it attaches to is given, and its geometry is recorded. The query can be therefore executed on the basis of either ID or the geometry. This index will increase the volume of recording but provide the most detailed visibility information for CCTVs selection.
Many reported incidents are located on the ground, but the terrain is usually represented as a continuous surface without a unique ID. By adopting a similar approach to DTM, a grid space-partition strategy is proposed to subdivide the surface of the terrain into a row-column structure. Each grid can then be given a unique ID, and its visibility to each CCTV can be analysed. The recording visibility result is based on the ID and geometry of the individual grid and the CCTVs that can provide visual reference. Similar to the aforementioned filter and refinement strategy, this design enables to first determine the grids the reported incidents intersect (one or many grids, depending on the geometry of the reported incidents), then retrieve all the candidate CCTVs for visual inspection and validation. Table 1 lists the major consideration of the three proposed indexes: Table 1. The recording method between the CCTV camera 3D FOV and the object.

Feature
The Composed Surface of the Feature The Area Visible from the CCTV

Recorded information
Feature-ID Composed surface-ID Feature-ID or composed surface-ID CCTV-ID and feature-ID CCTV-ID + composed surface-ID (extended from the feature ID) CTV-ID + feature-ID (or composed surface ID) + geometry What can be determined?
• If the specified object can be seen from a particular CCTV.

•
What objects can be seen from a specific CCTV? • The candidate CCTVs can provide a visual reference to the specified object.
• All the functions are available in the feature-ID scenario.

•
The composed surfaces of features that are visible from a specific CCTV.

•
The candidate CCTVs can provide a visual reference to a specific composed surface.
• All the functions available in the feature-ID or composed surface scenarios, depending on which level of ID is chosen. • Geometry analysis based on the location of the reported incident and the visible area.

Metadata Design to Aid Emergency Response CCTV Selection
To facilitate the best usage of the heterogeneous CCTV systems, it is necessary to introduce the standardised descriptions, i.e., metadata of the CCTV, such that the mechanism for discovery, catalogue and selection of CCTV can be developed accordingly. To ensure all the essential characteristics for the CCTV are considered, the well-known 5W1H method is adopted in this research to serve as the basis for metadata design. Based on the goal of this study, only the metadata that facilitates the selection of CCTV is considered. The major consideration in this study includes: Who: the related stakeholders that provide the CCTV services. What: the technical specifications of the CCTV. Where: the deployed parameters of the CCTV 3D FOV and objects the CCTV observes. Why: the intended purpose of the CCTV. When: the time the CCTV operation is available. How: the procedure to access and operate the CCTV service.
The standardised metadata proposed in this study intends to enable the emergency response commanders to quickly locate all the available CCTVs and determine their fitness of use upon reported disaster incidents. The sharing mechanism must allow to search and interpret the CCTV information across different platforms in an interoperable way and ensure all of the CCTVs are searched and checked. Even CCTV systems that are not catalogued beforehand can be readily added to the sharing mechanism as long as their metadata follows the suggested metadata schema. By effectively integrating heterogeneous CCTV systems, this mechanism enables to flexibly expand its service capabilities whenever necessary, and the usage of individual CCTV service is hence beyond its original purpose, i.e., CCTV systems previously deployed for criminal protection or traffic monitoring can be used for disaster validation as well.

WHO
Many types of stakeholders are associated with the successful operations of a CCTV system, e.g., the agencies or organisations that are responsible for the manufacture, deployment, operation, service, maintenance and use of the CCTV systems ( Figure 8). Sometimes one stakeholder may play multiple roles in the operation of CCTV systems. The design of the "Who" factor mainly considers what information about the stakeholders needs to be provided for decision making.
Many types of stakeholders are associated with the successful operations of a CCTV system, e.g., the agencies or organisations that are responsible for the manufacture, deployment, operation, service, maintenance and use of the CCTV systems ( Figure 8). Sometimes one stakeholder may play multiple roles in the operation of CCTV systems. The design of the "Who" factor mainly considers what information about the stakeholders needs to be provided for decision making. Based on the goal of this research, the role of the stakeholder is restricted to the responsible units that can grant access permission when the CCTV streaming service is required. The unique identification of the responsible units may be based on their official names or ID (identification number). In addition to the contact information, the access of some CCTV services may require additional application procedures due to commercial, security or privacy reasons. The designed metadata must allow users to contact the responsible person and get a permit to access the requested services.

WHAT
The "What" factor mainly considers the specification of the CCTV systems that may affect the operation and content of the CCTV service. The hardware specification has a dominant influence on the performance and the data content of an individual CCTV camera. The major considerations of metadata design include the hardware model and specifications, lens (including resolution), and working environment. Models and specifications include descriptions of the manufacturer, type, weight (in kilograms), size (length, width, and height information of CCTV cameras, in mm), and serial number. The lens part includes the description of the resolution, focal length, spectral band, night vision, distance, minimum object quantification, standard distance setting conditions, angle, rotatable, magnification, imaging method, and frequency. The working environment includes descriptions of temperature range, humidity range, and time range. Some metadata elements are used for direct selection reference, e.g., for finding CCTV that can work at night; some elements are used for further analysis, e.g., for calculating the 3D FOV. The major consideration for metadata design includes: a. As CCTV camera specifications will affect the image quality, metadata design must include the important parameters from the CCTV specifications. b. The resolution of the CCTV camera determines the quality of captured images. c. The focal length of the lens is determined according to the field of view. A large focal length can clearly capture distant objects, but the visual range is small. Conversely, the small focal length cannot shoot distant objects, but the visible range is larger. d. The capability to pan/tilt enables the flexible adjustment of visual coverage. e. Only CCTV cameras with infrared night vision functions can be used to monitor objects located in a low illumination environment. f.
The camera angle and zoom can be controlled through the control panel or computers. Based on the goal of this research, the role of the stakeholder is restricted to the responsible units that can grant access permission when the CCTV streaming service is required. The unique identification of the responsible units may be based on their official names or ID (identification number). In addition to the contact information, the access of some CCTV services may require additional application procedures due to commercial, security or privacy reasons. The designed metadata must allow users to contact the responsible person and get a permit to access the requested services.

WHAT
The "What" factor mainly considers the specification of the CCTV systems that may affect the operation and content of the CCTV service. The hardware specification has a dominant influence on the performance and the data content of an individual CCTV camera. The major considerations of metadata design include the hardware model and specifications, lens (including resolution), and working environment. Models and specifications include descriptions of the manufacturer, type, weight (in kilograms), size (length, width, and height information of CCTV cameras, in mm), and serial number. The lens part includes the description of the resolution, focal length, spectral band, night vision, distance, minimum object quantification, standard distance setting conditions, angle, rotatable, magnification, imaging method, and frequency. The working environment includes descriptions of temperature range, humidity range, and time range. Some metadata elements are used for direct selection reference, e.g., for finding CCTV that can work at night; some elements are used for further analysis, e.g., for calculating the 3D FOV. The major consideration for metadata design includes: a.
As CCTV camera specifications will affect the image quality, metadata design must include the important parameters from the CCTV specifications. b.
The resolution of the CCTV camera determines the quality of captured images. c.
The focal length of the lens is determined according to the field of view. A large focal length can clearly capture distant objects, but the visual range is small. Conversely, the small focal length cannot shoot distant objects, but the visible range is larger. d. The capability to pan/tilt enables the flexible adjustment of visual coverage. e.
Only CCTV cameras with infrared night vision functions can be used to monitor objects located in a low illumination environment. f.
The camera angle and zoom can be controlled through the control panel or computers.

WHY
Every CCTV system has its intended purposes and applications. Although the commanders can freely select any CCTV that may be useful to aid their decisions, descriptions about why the CCTV systems are deployed may also be useful reference information. An easy way is to ask the responsible units to provide a paragraph of free text to explain the deployed purpose of the CCTV systems. However, the lack of agreement on the keywords used by the responsible units may become an obstacle to developing an effective searching mechanism. A code list composed of a number of chosen keywords provides a possible solution, but the selection of keywords may require a great amount of discussion and coordination (e.g., limited to a particular domain like disaster management).

WHERE
The "Where" factor is a mandatory consideration for the selection of CCTV because it determines if an object is visible from a particular CCTV. Since CCTV systems are deployed beforehand, the visual coverage is also predetermined. The key challenges are to determine the visibility relationship between CCTVs and the objects of interest. The "Where" aspect involves three major considerations: a.
Location and posture parameters of the CCTV.
When managed with GIS, the location of CCTVs is mandatory information for showing the geographic distribution of CCTVs in maps, as well as serving as the basis for analysing its FOV. Whether the location is recorded by 2D or 3D coordinates also leads to differences in the results of the visibility analysis. The recording of location information requires the specification of the coordinate reference systems and determination of the corresponding coordinates of the deployed location of CCTV, often represented by two planar coordinates and one height information. The posture parameters of the CCTV are the rotation angles (pitch, roll and yaw) with respect to the axis of the CCTV camera. The range of pitch angle values is −90 (looking straight down) to 90 (looking straight up to the sky). The record value Roll is the angle at which the CCTV camera rotates left and right in the field of view. A positive value means that the CCTV camera will be scrolled to the right; a negative value means that the CCTV camera will be scrolled to the left. In addition to the quantitative representation approach, the location of the CCTV can also be represented by texts, e.g., the names of the facilities, landmark, road intersection, neighbourhood or even specific road mileage of roads, to provide meaningful location reference information to the commanders.

b. Field of View
After the location and posture parameters of the CCTV are determined, the 3D FOV of the CCTV can be determined accordingly. As discussed in Section 2, a 3D FOV can be generated at the location of the CCTV and oriented towards the direction specified by the major axis of the CCTV camera. The geometry of the 3D FOV can be represented by a multi-surface object based on the b-rep approach. This presents a hypothesised stereoscopic range within which the chosen CCTV can visually inspect the objects. When the FOV of multiple CCTVs are determined and combined with set operations, the whole 3D space is subdivided into two categories according to if the space is visible with current CCTV systems. The union of the 3D FOV of all CCTV systems represents an aggregated spatial extent of the integrated cross-discipline CCTV systems. In contrast, the geometric intersection of the 3D FOV represents the area that is observable from at least two CCTVs. The merits of such integration are that the selected CCTVs may come from different systems but can be effectively integrated to provide a much wider visual coverage than what a single CCTV system can provide. The multi-angle viewpoints are especially useful for inspecting, validating and assessing the real-time threats or damage. c.
The objects in reality.
From the viewpoint of observed objects, it is certainly advantageous to know whether the available CCTVs can provide visual information about the object of interest, and if so, by which CCTV. The objects observable by a particular CCTV can be determined beforehand if both the 3D information of CCTV FOV and city model data (e.g., buildings) is available. Such visibility information between CCTV and objects is recorded by a an information pair based on the identification information (e.g., CCTV-ID, object names or ID). The index between CCTV and objects then allow the query mechanism to quickly search for applicable CCTVs based upon the object names or ID. The level of completeness, level of details of the objects and even the dynamic changes of objects caused by hazards are all factors that may influence the visibility analysis.

WHEN
For using CCTV in emergency response tasks, one critical requirement is whether the CCTV service is available at the time when needed. This information can be used for either assessing the real-time monitoring or rebuilding the situation of past events. The "When" factor of CCTV systems mainly considers the various types of temporal information related to CCTV, including installation, shooting time and data service.

a. Installation
Data recording by CCTV is available only after it is installed. Although only used for reference, the time of installation can determine if a CCTV is available for a particular given date when the historical recording is demanded.

b.
Operation time Some CCTV systems only work during specified periods of time, meaning even if the CCTV systems fulfil all the other types of constraints, it still does not guarantee the recording data will be available. This description is also related to the types of CCTV, e.g., a camera without a night vision function will not work at night.

c.
Service time Many CCTV services provide real-time access via the internet. This description explains the time the online services are available for access. The ideal situation would be non-stop service without any interruption.

d. Historical archive
In addition to the real-time access, commanders may need to request the data recording in the past. This metadata element explains if there is a historical archive and how long the recording will be kept. As such service would be different from the real-time access where users can easily access with the provided URL, an additional access interface allowing to specify temporal constraints is necessary.

HOW
Information about how to instruct the commanders to access the selected CCTV service is necessary. Many CCTV systems are now publicly available, such that the URL information of the online streaming service alone can already meet the access demands. Some of the systems are protected due to the restriction enforced by the system owners. Under such circumstances, it may require additional procedures to apply for the access of the recording of CCTV systems. The metadata related to the HOW aspect is designed to explain if the service is available, the online link of the service and the instruction for accessing the service.
The above 5W1H perspectives are used to analyse the required metadata elements that can provide descriptive information for available CCTV systems, such that commanders can take advantage of all the available CCTV services, regardless of their heterogeneity, for making prompt emergency response decisions. An effective sharing mechanism would require the stakeholders of all the CCTV systems to establish standardised metadata beforehand so that the metadata-based mechanism is ready for CCTV querying when emergency situations occur. Commanders can specify various types of constraints on the metadata, including 3D FOV analysis, to discover the candidate set of CCTVs for their decision reference. Whether the CCTV streaming service is available to the commanders, however, still depends on if the collaboration relationship between the EOC and authorised stakeholders of the CCTV systems can be established. Based on the analysed results, Figure 9 illustrates the UML schema of the proposed metadata. The proposed UML schema consists of 6 classes and 41 elements. The class of CAM-ERA is designed to describe the various parameters of CCTV hardware specification. The class of 3D FOV is associated with the class of CAMERA for providing the information of its field of view. Its elements include the necessary parameters for calculating the 3D FOV. It also allows recording the 3D FOV by the multi-surface geometry. For recording the visibility relationship between the CCTV and objects, the class of OBJECT and B-rep are used for recording the identification information of an object and its geometry. The class CAM-ERA is further associated with the classes of LOCATION and SERVICE. Metadata elements of the proposed classes are based on the above 5W1H analysis of CCTV systems. All designed metadata elements must be formally defined. Tables 2 and 3 show the data dictionary of the classes of Camera and 3D FOV in Figure 9.  The proposed UML schema consists of 6 classes and 41 elements. The class of CAM-ERA is designed to describe the various parameters of CCTV hardware specification. The class of 3D FOV is associated with the class of CAMERA for providing the information of its field of view. Its elements include the necessary parameters for calculating the 3D FOV. It also allows recording the 3D FOV by the multi-surface geometry. For recording the visibility relationship between the CCTV and objects, the class of OBJECT and B-rep are used for recording the identification information of an object and its geometry. The class CAMERA is further associated with the classes of LOCATION and SERVICE. Metadata elements of the proposed classes are based on the above 5W1H analysis of CCTV systems. All designed metadata elements must be formally defined. Tables 2 and 3 show the data dictionary of the classes of Camera and 3D FOV in Figure 9.

Test and Analysis
In the following discussion, 3D FOV simulation and analysis is implemented with ArcGIS Pro's Exploratory 3D Analysis module. Metadata is created with Postgresql. By choosing the Chengkung and Guangfu campus at NCKU as the test site, 20 CCTVs respectively belong to three different systems, namely, city hall monitor, department monitor, and campus monitor system are simulated. Figure 10 shows the sector-based 2D FOV of the CCTVs at the test site, with different colours indicating different systems. The metadata for each CCTV is established according to the schema proposed in Section 3. A simplified version of metadata for CCTV is presented in Table 4.

Scenario 1
Scenario 1 is designed to test the creation and use of the 3D FOV information of a single CCTV. Every CCTV is required to establish its metadata according to the schema proposed in Section 3. The visibility information for each CCTV is determined by the locational relationships between the 3D FOV of the CCTVs and the 3D buildings. Figure 11 shows the visibility analysis results of Camera 1. The region marked by the colour green

Scenario 1
Scenario 1 is designed to test the creation and use of the 3D FOV information of a single CCTV. Every CCTV is required to establish its metadata according to the schema proposed in Section 3. The visibility information for each CCTV is determined by the locational relationships between the 3D FOV of the CCTVs and the 3D buildings. Figure 11 shows the visibility analysis results of Camera 1. The region marked by the colour green and red, respectively, depict the area visible and invisible from Camera 1. With the 3D perspective added, it clearly shows that some of the areas are obstructed by the buildings, and the FOV is no longer a sector-based area, as shown in Figure 10. This implies the ignorance of existing objects in reality (e.g., buildings) may give the commanders a false interpretation regarding the choice of CCTVs, and the 2D FOV approach is not sufficient. Figure 12 shows the actual image obtained. An obvious and additional advantage of introducing 3D FOV is the visible information is not restricted to the ground only. The visible information now extends to the vertical dimension, such that whether a particular building or wall is visible or not can also be determined and recorded in the metadata. When working in a 3D GIS environment, the commander can freely rotate or enlarge the 3D scene to gain a better understanding of the 3D FOV and the area covered by the available CCTVs. The direct visual illustration provides an improved interface for inspecting the 3D FOV of CCTV and enables the choice of the best CCTV for the task at hand. Following the proposed UML, the visible relationship between the CCTV and buildings can be further recorded on the basis of their IDs. In this case, the index explicitly records the visible relationship between Camera 1 and Building ID 149741, 151522, 151531, 148834. This implies the commander has the flexibility to query the visible buildings of a single CCTV by specifying its ID ( Figure 13) and obtain the information of all objects it can observe. If the 3D information of the CCTV FOV or the buildings change, the visible relationship has to be reanalysed and updated in the metadata and index. and red, respectively, depict the area visible and invisible from Camera 1. With the 3D perspective added, it clearly shows that some of the areas are obstructed by the buildings, and the FOV is no longer a sector-based area, as shown in Figure 10. This implies the ignorance of existing objects in reality (e.g., buildings) may give the commanders a false interpretation regarding the choice of CCTVs, and the 2D FOV approach is not sufficient. Figure 12 shows the actual image obtained. An obvious and additional advantage of introducing 3D FOV is the visible information is not restricted to the ground only. The visible information now extends to the vertical dimension, such that whether a particular building or wall is visible or not can also be determined and recorded in the metadata. When working in a 3D GIS environment, the commander can freely rotate or enlarge the 3D scene to gain a better understanding of the 3D FOV and the area covered by the available CCTVs. The direct visual illustration provides an improved interface for inspecting the 3D FOV of CCTV and enables the choice of the best CCTV for the task at hand. Following the proposed UML, the visible relationship between the CCTV and buildings can be further recorded on the basis of their IDs. In this case, the index explicitly records the visible relationship between Camera 1 and Building ID 149741, 151522, 151531, 148834. This implies the commander has the flexibility to query the visible buildings of a single CCTV by specifying its ID ( Figure 13) and obtain the information of all objects it can observe. If the 3D information of the CCTV FOV or the buildings change, the visible relationship has to be reanalysed and updated in the metadata and index.  and red, respectively, depict the area visible and invisible from Camera 1. With the 3D perspective added, it clearly shows that some of the areas are obstructed by the buildings, and the FOV is no longer a sector-based area, as shown in Figure 10. This implies the ignorance of existing objects in reality (e.g., buildings) may give the commanders a false interpretation regarding the choice of CCTVs, and the 2D FOV approach is not sufficient. Figure 12 shows the actual image obtained. An obvious and additional advantage of introducing 3D FOV is the visible information is not restricted to the ground only. The visible information now extends to the vertical dimension, such that whether a particular building or wall is visible or not can also be determined and recorded in the metadata. When working in a 3D GIS environment, the commander can freely rotate or enlarge the 3D scene to gain a better understanding of the 3D FOV and the area covered by the available CCTVs. The direct visual illustration provides an improved interface for inspecting the 3D FOV of CCTV and enables the choice of the best CCTV for the task at hand. Following the proposed UML, the visible relationship between the CCTV and buildings can be further recorded on the basis of their IDs. In this case, the index explicitly records the visible relationship between Camera 1 and Building ID 149741, 151522, 151531, 148834. This implies the commander has the flexibility to query the visible buildings of a single CCTV by specifying its ID ( Figure 13) and obtain the information of all objects it can observe. If the 3D information of the CCTV FOV or the buildings change, the visible relationship has to be reanalysed and updated in the metadata and index.

Scenario 2
Scenario 2 demonstrates the search for candidate CCTVs from a specified location or the objects in reality. When the location of a disaster incident (22°59'58.7" N 120°13'00.9" E) is reported, the selection of CCTV starts at filtering candidate CCTVs by testing the "within" relationship between the reported location and the 2D FOV of the CCTVs. In this case, 17 CCTVs are removed, and only 3 CCTVs remain on the candidate list ( Figure 14). Further 3D FOV analysis in the refinement step shows only CCTV 2 and 3 can provide visual inspection; CCTV 1 is excluded due to obstruction of the building (Figure 15). Another test takes the Building 149741 as the constraint; the search results on the index Table 5 (based on the visible objects in metadata) , find two CCTVs can provide

Scenario 2
Scenario 2 demonstrates the search for candidate CCTVs from a specified location or the objects in reality. When the location of a disaster incident (22 • 59'58.7" N 120 • 13'00.9" E) is reported, the selection of CCTV starts at filtering candidate CCTVs by testing the "within" relationship between the reported location and the 2D FOV of the CCTVs. In this case, 17 CCTVs are removed, and only 3 CCTVs remain on the candidate list ( Figure 14). Further 3D FOV analysis in the refinement step shows only CCTV 2 and 3 can provide visual inspection; CCTV 1 is excluded due to obstruction of the building (Figure 15).

Scenario 2
Scenario 2 demonstrates the search for candidate CCTVs from a specified location or the objects in reality. When the location of a disaster incident (22°59'58.7" N 120°13'00.9" E) is reported, the selection of CCTV starts at filtering candidate CCTVs by testing the "within" relationship between the reported location and the 2D FOV of the CCTVs. In this case, 17 CCTVs are removed, and only 3 CCTVs remain on the candidate list ( Figure 14). Further 3D FOV analysis in the refinement step shows only CCTV 2 and 3 can provide visual inspection; CCTV 1 is excluded due to obstruction of the building (Figure 15). Another test takes the Building 149741 as the constraint; the search results on the index Table 5 (based on the visible objects in metadata) , find two CCTVs can provide Another test takes the Building 149741 as the constraint; the search results on the index Table 5 (based on the visible objects in metadata), find two CCTVs can provide visual information about Building 147941 (Table 6). Figures 16 and 17a,b, respectively, show the 3D FOV analysis and the two images captured by the two cameras. Although building 149741 is recorded as visible by both CCTVs, the 3D FOV analysis shows only parts of the building are visible due to the obstruction. With these two CCTV selected, the index also shows parts of building ID 151531 are also visible from both CCTVs. This test clearly demonstrates the advantage of managing multiple CCTV systems through standardised metadata and the potential to provide better visual coverage than what a single CCTV system alone can provide. The above 3D illustration shows the area of invisible, visible and multi-coverage with the colours of red, green and yellow. The multiple coverage areas, as Section 2.3 suggests, can be explicitly recorded. Thus, as long as the reported incident is within it, the information of this set of CCTVs can be retrieved directly from the database.  (Table 6). Figures 16 and 17 (a), (b), respectively, show the 3D FOV analysis and the two images captured by the two cameras. Although building 149741 is recorded as visible by both CCTVs, the 3D FOV analysis shows only parts of the building are visible due to the obstruction. With these two CCTV selected, the index also shows parts of building ID 151531 are also visible from both CCTVs. This test clearly demonstrates the advantage of managing multiple CCTV systems through standardised metadata and the potential to provide better visual coverage than what a single CCTV system alone can provide. The above 3D illustration shows the area of invisible, visible and multi-coverage with the colours of red, green and yellow. The multiple coverage areas, as Section 2.3 suggests, can be explicitly recorded. Thus, as long as the reported incident is within it, the information of this set of CCTVs can be retrieved directly from the database.    (Table 6). Figures 16 and 17 (a), (b), respectively, show the 3D FOV analysis and the two images captured by the two cameras. Although building 149741 is recorded as visible by both CCTVs, the 3D FOV analysis shows only parts of the building are visible due to the obstruction. With these two CCTV selected, the index also shows parts of building ID 151531 are also visible from both CCTVs. This test clearly demonstrates the advantage of managing multiple CCTV systems through standardised metadata and the potential to provide better visual coverage than what a single CCTV system alone can provide. The above 3D illustration shows the area of invisible, visible and multi-coverage with the colours of red, green and yellow. The multiple coverage areas, as Section 2.3 suggests, can be explicitly recorded. Thus, as long as the reported incident is within it, the information of this set of CCTVs can be retrieved directly from the database.   Camera1  149741  Camera1  151522  Camera1  151531  Camera1  148834  Camera2  149741  Camera2  151531  Camera2  142746  Camera2  149743  Camera2  151570  Camera2  146454  Camera3 142723

Scenario 3
Since some types of CCTVs may not work at night, Scenario 3 is designed to demonstrate the different visual coverage during daytime and nighttime. When the analysed scenarios are at night, CCTVs incapable of providing direct visual inspection can be easily excluded by specifying constraints on the metadata element of night vision. Camera 3 in this scenario is a CCTV that only operates in the daytime. While in the daytime, the yellow region in Figure 18 depicts the area that can be simultaneously observed from all three CCTVs. After Camera 3 is removed at night, the monitored area appears to be reduced ( Figure 19). In this scenario, buildings 142723 and 151530 can be monitored in the daytime but are not visible at night. With the introduction of standardised metadata, commanders are provided with correct visible information if the night vision metadata of all CCTV systems are created accordingly. The system can therefore adapt to the time-dependent characteristics of CCTV systems, and most importantly, the commanders are prompted with the "correct" CCTV visual coverage all the time. This process can also be used for a situation like CCTV out of service, not available, only authorised to certain circumstances, etc.
scenarios are at night, CCTVs incapable of providing direct visual inspection can be easily excluded by specifying constraints on the metadata element of night vision. Camera 3 in this scenario is a CCTV that only operates in the daytime. While in the daytime, the yellow region in Figure 18 depicts the area that can be simultaneously observed from all three CCTVs. After Camera 3 is removed at night, the monitored area appears to be reduced ( Figure 19). In this scenario, buildings 142723 and 151530 can be monitored in the daytime but are not visible at night. With the introduction of standardised metadata, commanders are provided with correct visible information if the night vision metadata of all CCTV systems are created accordingly. The system can therefore adapt to the time-dependent characteristics of CCTV systems, and most importantly, the commanders are prompted with the "correct" CCTV visual coverage all the time. This process can also be used for a situation like CCTV out of service, not available, only authorised to certain circumstances, etc.

Scenario 4
When a disaster incident is reported, the selection of CCTV is determined according to the reported location and the 3D FOV of the CCTV. When the location of incidents is arbitrarily specified, the above visibility index cannot be directly used because the specified location may not refer to an existing object, so the analysis has to be determined by real-time analysis. If the reported location is within the green or yellow area, then it is visible from at least one CCTV. If it is in the red area or not within the FOV of any CCTV, then it is not visible. By developing the grid indexing, the visibility between the CCTV and grid cells can be created beforehand ( Figure 20). The analysis can then be simplified to be based on the grid cell in which the reported incident locates. This design avoids the complicated real-time analysis by determining the visible grid cells beforehand. Similar to the feature-based index, the grid-based index serves as the mechanism for filtering unrelated grids and narrows down the candidate grids and CCTV for further refinement analysis.

Scenario 4
When a disaster incident is reported, the selection of CCTV is determined according to the reported location and the 3D FOV of the CCTV. When the location of incidents is arbitrarily specified, the above visibility index cannot be directly used because the specified location may not refer to an existing object, so the analysis has to be determined by realtime analysis. If the reported location is within the green or yellow area, then it is visible from at least one CCTV. If it is in the red area or not within the FOV of any CCTV, then it is not visible. By developing the grid indexing, the visibility between the CCTV and grid cells can be created beforehand ( Figure 20). The analysis can then be simplified to be based on the grid cell in which the reported incident locates. This design avoids the complicated real-time analysis by determining the visible grid cells beforehand. Similar to the feature-based index, the grid-based index serves as the mechanism for filtering unrelated grids and narrows down the candidate grids and CCTV for further refinement analysis.
then it is not visible. By developing the grid indexing, the visibility between the CCTV and grid cells can be created beforehand ( Figure 20). The analysis can then be simplified to be based on the grid cell in which the reported incident locates. This design avoids the complicated real-time analysis by determining the visible grid cells beforehand. Similar to the feature-based index, the grid-based index serves as the mechanism for filtering unrelated grids and narrows down the candidate grids and CCTV for further refinement analysis.

Scenario 5
The outcomes of the proposed approach heavily rely on the analysis of the visibility relationship between the 3D FOV and 3D city model data. While the 3D FOV information

Scenario 5
The outcomes of the proposed approach heavily rely on the analysis of the visibility relationship between the 3D FOV and 3D city model data. While the 3D FOV information often remains unchanged, the phenomena available from the 3D city model data become an important factor for the visibility outcomes. The more detailed information we introduce to the analysis, the more accurate information we can get for the visibility outcomes. Among the various types of phenomena in the urban environment, the impacts caused by trees must be considered. Urban afforestation provides its citizens with a healthy and comfortable living environment, but on the other hand, it may also tremendously obstruct the viewable objects of the CCTVs. Especially for boulevards, the visual coverage may be substantially reduced after trees are considered. Figure 21a,b, respectively, show the 3D FOV analysis with and without considering the impacts of trees. In Figure 21a, the wall of building 7884 is visible, but only about half of it is visible from both CCTVs. The selection of both cameras appears to be a reasonable choice. However, after trees are considered, not only the multiple coverages on the wall of building 7884 almost drops to 0, the visible part also significantly reduces, not to mention there is only one CCTV that can provide visual aids now. Under such circumstances, the two CCTVs are no longer a good choice. By additionally considering other types of objects, the visibility of an object and its composing surface may deteriorate from visible to invisible, or multiple coverages become single coverage or even invisible. Based on the proposed 3D FOV analysis, this can all be evaluated and recorded in a quantitative way. Since the materials of the majority of objects, in reality, are non-transparent, they are surely visual obstacles that may cause all kinds of obstruction. This also proves the importance of the content of the 3D city database to the proposed mechanism.

Conclusions
For monitoring the continuously changing phenomena, in reality, the real-time direct visual observation obtained from CCTVs is unique and precious. However, the lack of the ability to effectively manage such information will hinder the maximum advantages of existing CCTV systems, especially for emergency response that may demand instantaneous observations for making action decisions. This research has a dual purpose. On the one hand, we hope to expand the scope of CCTV sharing, and on the other hand, we also look forward to smartly recommending CCTV fits for the emergency response. After extensively examining the relevant factors, we proposed a metadata-driven approach for standardising the descriptions of CCTVs from the 5W1H perspective. Based on the comprehensive design of metadata elements, we demonstrated the search, selection and access of CCTVs for the reported incidents can be completed via various types of constraints on the metadata elements of location, authority, time and CCTV specifications in an interoperable way, regardless of who operates the systems. The standardised metadata, therefore, removes the obstacles of managing heterogeneous CCTV systems, enables the development of an integrated sharing mechanism for CCTV services and recommends a candidate set of CCTVs from the available CCTV systems. The capacity for aiding emergency response with CCTV can be increasingly expanded with more CCTV system owners willing to create metadata complying with the proposed schema. The proposed approach, therefore, successfully serves to bridge the information gap between the EOC and the various CCTV systems owned by different stakeholders. Among the 5W1H perspectives, "where" serves as the essential consideration for determining visibility between CCTVs and observed objects. When compared with the traditional management mechanism of CCTV systems, the CCTV recommendation strategy in this research is fully enhanced from the 3D perspective by adding the consideration of vertical dimension. We combined the 3D FOV of CCTVs with the 3D city model data and proposed to use the visibility relationship between the CCTV and the objects it observes as the key of a feature-based index. We demonstrated 3D visibility analysis presents significantly different outcomes than the 2D approach by adding the effect of obstruction into consideration, which is a mandatory factor to determine visibility. This index serves for queries of CCTVs based on constraints on the features or coordinates of reported incidents, as well as on the CCTV specifications (e.g., CCTV type, night version). In addition, the information regarding the area with no or multiple coverages by currently available CCTV systems can also be readily determined. As an indispensable component for applications of smart cities and smart disaster management, the proposed approach proves to be able to integrate independent and heterogeneous CCTV systems and facilities for their multi-purpose use. The addition of 3D FOV consideration also proves valuable in terms of recommending CCTVs that may best fit for supporting the commanders' direct visual needs. Since the major concern is to determine if the object is visible, this paper mainly focused on the geometry aspect. With the comprehensive design of metadata, more knowledge about CCTV selection can be developed in future studies upon the proposed metadata elements. Meanwhile, the successful operation of the proposed approach absolutely requires considering the institution and infrastructure issues, e.g., who should join, mandatory or voluntary, the collaboration between government and private sectors. This is beyond the scope of this paper and deserves to be pursued further in the future.

Data Availability Statement:
The data used to support the findings of this study are available from the corresponding author upon request.