The Urban Observatory: a Multi-Modal Imaging Platform for the Study of Dynamics in Complex Urban Systems

We describe an"Urban Observatory"facility designed for the study of complex urban systems via persistent, synoptic, and granular imaging of dynamical processes in cities. An initial deployment of the facility has been demonstrated in New York City and consists of a suite of imaging systems - both broadband and hyperspectral - sensitive to wavelengths from the visible (~400 nm) to the infrared (~13 micron) operating at cadences of ~0.01 - 30 Hz (characteristically ~0.1 Hz). Much like an astronomical survey, the facility generates a large imaging catalog from which we have extracted observables (e.g., time-dependent brightnesses, spectra, temperatures, chemical species, etc.), collecting them in a parallel source catalog. We have demonstrated that, in addition to the urban science of cities as systems, these data are applicable to a myriad of domain-specific scientific inquiries related to urban functioning including energy consumption and end use, environmental impacts of cities, and patterns of life and public health. We show that an Urban Observatory facility of this type has the potential to improve both a city's operations and the quality of life of its inhabitants.


Introduction
With millions of interacting people and hundreds of governing agencies, urban environments are the largest, most dynamic, and most complex human systems on earth. Some 80% of the US population (and 50% of the global population) now live in cities [1], and within those urban boundaries are a multitude of interactions between the three fundamental components of urban systems: the inhabitants, the natural environment, and the built environment. The study of cities as complex systems is not new [2][3][4]. Research in urban planning [5,6], engineering [7,8], transportation [9,10], etc., all have a rich history of quantifying city life at multiple scales and with networked interactions [11,12]. The application of that work addresses everything from quality of life [13][14][15] to public health [16,17] to sustainability and resilience [18][19][20]. However, there are two recent revolutions that are leading to a dramatic change in the way we understand complexity in urban environments: the systematic collection, digitization, and curation of vast quantities of urban records data [21][22][23] and the development of computational techniques that make it possible to jointly analyze large, disparate data sources. This advent of large scale urban data collection and availability, and the application of modern statistical methods to that data, has led to the emerging field of "urban science" [24][25][26][27], in which approaches from the natural, computational, and engineering sciences are blended with those from the social and earth sciences to describe emergent phenomena and interactions between the the human, built, and natural environment components of urban systems. Interactions between these components are characterized by their temporal evolution over multiple time scales and their interconnection between multiple urban subsystems [11,28,29]. These "dynamics" in complex urban systems present in contexts as diverse as patterns of transportation activity [30], pedestrian foot traffic [31,32], urban energy use [33], heating and cooling of infrastructure [34,35], air quality variability [36,37], human mobility and displacement [38][39][40][41], and evolution of urban boundaries [42][43][44]. Urban science lines of inquiry are characterized by the analysis of highly heterogeneous data sources of diverse origin and provenance, including public records data, transactional data, and sensor data (e.g., in situ environmental sensors, audio [45,46], imaging/video [31,47,48], etc.), and it is the fusion of those data sources in the context of urban dynamics and functioning that underpins the methodological approach of urban science.
An important field of study for complex urban environments is that of remote sensing [49][50][51][52][53] and the associated data collection from overhead satellite platforms at a variety of wavelengths [54,55]. These satellite data are both large in volume and rich in information content, and their analysis has been used to correlate remote observables with land use characteristics [56], energy consumption [57][58][59][60][61], light pollution [62][63][64], poverty [65][66][67], and urban growth [68][69][70] (among many others). Characteristically, these overhead platforms provide spatial resolution of ∼10 s of meters and temporal resolution ∼1 day or more (though recent advances in satellite technology are pushing towards increased granularity on both fronts). However, at present, this resolution is insufficient to study dynamical interactions in cities on the time scale of minutes, and the overhead vantage point-while ideal for total spatial coverage-only captures photons that are emitted upward, providing an incomplete picture for several science cases (e.g., the ecological and public health impacts of light pollution [62,[71][72][73]).
In this paper, we describe an "Urban Observatory" (UO) for the study of dynamical processes in complex urban systems. In Section 2 , we outline our initial deployment of this UO in New York City (NYC), a multi-modal platform with the flexibility and spatiotemporal sensitivity and resolution to quantify a myriad of dynamical processes in urban systems that are not observable by other means of remote sensing. In Section 3, we give an in-depth description of the range of urban phenomenology that is accessible via this system, including unique quantifications of three key components of complex urban systems and urban metabolism [74] more generally: energy use in cities, environmental impacts, and human factors. In Section 4, we discuss the linkages between these diverse urban science domain applications in the context of subsystem dependencies and urban metabolism. In Section 5, we conclude with what we envision to be important future deployments of UO facilities in other cities.

Instrumentation
The core UO platform consists of a suite of imaging systems mounted atop tall buildings at a distance of 1-4 km from the area of study (see Figure 1), providing a synoptic view of city skylines [75,76]. As we discuss in more depth in Section 3, the spatial resolution of UO instrumentation is sufficiently granular to segment the imaging at the sub-building level, and (inspired by astronomical observatories operating in survey mode such as the Sloan Digital Sky Survey [77] or PanSTARRS [78]) the system operates persistently in order to observe multiple temporal scales: minutes, hourly, daily, weekly, monthly, annual, etc. In addition to the imaging devices themselves, associated computing hardware is located both at the device (where minimal edge computations are performed) and at a central server location. The former consists of small mini-computers (Raspberry PIs or similar), while the latter is a multi-purpose backend infrastructure that 1. Drives the imaging devices remotely; 2. Pulls and queues data transfer from the deployed devices; 3. Performs signal processing, computer vision, and machine learning tasks on the images and associated products; and 4. Holds and serves imaging and source databases associated with the data collection and analysis.  This backend infrastructure/architecture is described in detail in Appendix A. We note that while we present our fully deployed UO below, the fundamental components of the observational system are modular in the sense that individual modalities can be deployed in various combinations (and with various spectral or spatial resolutions) with associated variation in the required backend computational and data storage capacity as well as monetary cost. Furthermore, as we highlight in Section 3, different combinations of modalities enable different urban science drivers. While we envision that the UO will ultimately carry out observations across the full (available) electro-magnetic spectrum at multiple time scales, the current deployments-which yield the science content described in Section 3-consist of the following modalities.

Broadband Visible
The first deployed camera system was a three color (RGB), 8 Megapixel USB camera with a 35mm lens mounted atop a tall building in Brooklyn, NY with a panoramic view of the Manhattan skyline ( Figure 1). The instrument was set to a fixed pointing, enclosed in a weatherized housing, and triggered to acquire images via a co-located laptop in a weatherized casing. The initial image capture rate was f = 0.1 Hz [75]. Current deployments have incorporated a pan/tilt mechanism for increased field-of-view (FOV) via multiple pointings, mini-PCs for triggering and data transfer, and ethernet controlled instruments of two types: 20 Megapixel cameras (sampling at f = 0.1 Hz) and DSLRs operating in video mode at a sampling frequency of f ≈ 30 Hz.

Broadband Infrared
Our current broadband infrared (IR) devices are FLIR A310/320 cameras with a pixel resolution of 320 × 240, wavelength range of 7.5-13 micron, and temperature sensitivity of ± 2 • C. As with our visible wavelength imaging, our initial IR deployment was encased in a weatherized housing, had a fixed pointing, and operated at f = 0.1 Hz (Figure 1 shows an example image), while subsequent deployments incorporate a pan/tilt mechanism for increased FOV.

Visible and Near Infrared Hyperspectral
In addition to our broadband imaging devices, we have deployed hyperspectral imagers operating at visible and near-infrared (VNIR) wavelengths. These instruments are single-slit spectrographs: the aperture is a vertical slit while a diffraction grating behind the slit generates the wavelength information. They are mounted atop pan/tilt mechanisms that provide horizontal information as the detector is exposed at ≈30 frames per second during the pan. The wavelength range is 0.4-1.0 micron with an average bandwidth of 0.75 nm resulting in ∼850 spectral channels [76,79]. Scans are captured at cadences of ∼10 −3 Hz. An example image and associated spectra are shown in Figure 1.

Long Wave Infrared Hyperspectral
In April of 2015, the UO carried out a test deployment of a Long Wave IR (LWIR) hyperspectral camera. This actively cooled instrument was sensitive to 7.5-13.5 micron in 128 spectral channels and was operated in survey mode at roughly f ∼ 0.01 Hz, using the same panning mechanism described for the VNIR camera above. The deployment was done in collaboration with the Aerospace Corporation from whom the UO rented the equipment for an 8-day observational campaign [80].

Data Fusion
In order to maximize the utility of our UO facility, it is important that we be able to integrate the massive imaging data sets that we generate with available geospatial data including publicly available data such as census data, building-level energy consumption, fuel types, external sensor data (e.g., air quality sensors or radar data), etc. Our data fusion utilizes publicly available LiDAR data from NYC [81] to locate the 3-space coordinate that is covered by each pixel. Specifically, using the collinearity equations, we project the topographic LiDAR point cloud into the 2D-image plane (given the position, pitch, yaw, roll, and focus of the camera) and, for each pixel, choose the closest LiDAR point. Since the LiDAR resolution is ∼1 foot, there are pixel lines-of-sight for which the nearest LiDAR point is behind the nearest surface in that direction, and so we make a "no overhang" approximation and assign a given pixel the same 3-space coordinate as the pixel above it if the pixel above it is found to be closer to the camera. Finally, we use the publicly available MapPLUTO (Primary Land Use Tax-lot Output) data that contains the geospatial footprints for each building to associate the x,y components of the 3-space coordinate of a pixel with a given building footprint. Thus, we are able to tag each pixel as "observing" a given building (see Figure 2). Additional geospatial data at coarser resolution (e.g., census tracts, utility distribution zones, etc.) can be associated as well. Photogrammetric techniques allow us to use LiDAR to identify the buildings observed by each pixel (center right) given the building taxlot footprints from a publicly available data base (right). With these building IDs, we can integrate geospatial information with image data.
By identifying the individual buildings that are viewable in a given image, and by using topographic information related to the building shapes themselves, we are also able to estimate spatial coverage and the impacts of occlusion. For a large city like NYC, complete coverage is not possible with a single deployment (we currently have two deployment sites in NYC) and in fact our image segmentation methods also allow for the determination of "optimal" vantage points to assess future sites. Finally, it is important to point out that, for a given urban science domain question, the establishment of a statistical sample size is important to make inferences regarding aggregate patterns. Deriving a building ID and occlusion fraction for each pixel allows for a determination of whether a given scene contains sufficient information. For example, in [75] the robustness of patterns of lighting activity (see Section 3.3.1) were assessed by sequentially randomly sub-sampling the number of light sources in the scene to check for coherence of the associated temporal patterns. While not all scenes provide sufficient coverage for all lines of inquiry, in Section 3 below we demonstrate that a wide variety of domain-specific studies are possible from a single vantage point.

Privacy Protections and Ethical Considerations
At its core, the UO is a platform for automated analysis of living environments through images and the application of machine learning, computer vision, and computational statistics to extract dynamical, time-dependent measures of urban functioning. As such, it falls within a category of methods for which ethical considerations are essential to ensure that the platform is dedicated to improving quality of life, protection of natural resources, and prevention of data misuse [82,83].
Concerns related to both the reliability and scope of models and methodology specifically arise in the context of computer vision applications [83][84][85]. In particular, given the nature of our observational platform, it is of highest importance that the UO ensure appropriate privacy protections for inhabitants of cities [86]. This imperative is not unique to the UO nor is it restricted to imaging modalities; all observational platforms (whether acoustic sensors, air quality sensors, traffic sensors, etc.) must enact strict limits on the type and content of the data they collect-and the analysis procedures applied to that data-to ensure the privacy of individuals.
However, while the issue of personally identifiable information (PII) [87] collection is mitigated by the UO image resolution that by design does not contain sufficient information to identify individuals within the images as shown in Figure 3 (implying that the UO imaging is not subject to the ethical implications of collection of PII such as facial [88] or gait [89] features; see below), the urban science domain applications that we describe in Section 3 (see Table 1) constitute the study of living, dynamical patterns in cities. To mitigate the risks associated with the fusion of such data (see section Section 2.5) across multiple data sets [90,91], we enact strict additional privacy protecting policies beyond spatial resolution limitations. The UO's privacy protection policies consist of four core components: 1. No personally identifiable information (PII) [87] is collected. 2. All imaging is strictly limited in pixel resolution so that building interiors cannot be seen (see Figure 3). 3. All analysis is aggregated and de-identified. 4. An individual source cannot be tracked across multiple modalities to infer PII.
Furthermore, it is UO policy that data and analyses are not voluntarily shared with enforcement agencies (e.g., law enforcement, environmental regulatory enforcement, etc.). Finally, while the UO data to date has not been made publicly available, we anticipate that future public release of imaging data from UO deployments will incorporate additional privacy protection measures including long temporal lags between data collection and data release, release of aggregate (rather than raw) extracted features, temporal and spatial downsampling, and application of differential privacy techniques to extracted patterns of activity [92,93]. Table 1. The urban science domains accessible to UO operations for a each broadband (BB) and hyperspectral (HSI) modality including the required spatial and temporal resolution to extract the observables necessary to inform a given line of inquiry. While the fundamental capability to address these urban science domains has been demonstrated in Section 3, we also give the current status of full scale analyses in prior and future works.

Urban Science and Domains
The instrumentation and associated operational modes described in Section 2 enable a wide variety of domain science inquiries related to the dynamics of urban systems from sub-second to yearly time-scales. Below, we describe several core aspects of city functioning that we are exploring through the application of computer vision, image processing, machine learning, and astronomical analysis techniques to the UO data streams. These avenues of study can be largely grouped into three categories: Energy, Environment, and Human Factors. These represent topical areas that are closely associated with the fundamental components of urban systems. The science focus of the UO is to develop a deeper understanding of cities through observational determination of the dynamical interplay between these areas.
We note that, regardless of imaging modality (e.g., high frequency video or low frequency hyperspectral), the segmentation of each image into individual buildings by geolocating each pixel in the image as we have demonstrated in Section 2.5, is an essential first step in all of the topics described below. Furthermore, this "building ID" step allows us to fuse imaging data from the UO with publicly available records data which can enable lines of inquiry that are otherwise impossible.

Energy
Urban energy use-including electricity, natural gas, heating oil, etc.-drives city functioning and is at the heart of urban science questions related to resource consumption and distribution, infrastructure performance and maintenance, and future urban planning. Furthermore, it serves as the primary source of cities' impacts on the environment [98]. The UO's imaging capabilities with their multiple spatial and temporal scales allow for numerous unique ways of quantifying urban energy use characteristics from remote vantage points.

Remote Energy Monitoring
Remote assessment of energy use in cities by overhead observational platforms has a long history (e.g., [58,59,99]) including the use of lighting as a proxy for both electricity and total consumption [100][101][102]. Recently, there have been several studies using these overhead platforms to estimate the spatio-temporal dynamics of energy use (and the corresponding impacts on CO 2 emissions [58,103,104]) on timescales of weeks and spatial scales ∼1 km [48,61,105] with, for example, nighttime stable light data derived from imaging by the Defense Meteorological Satellite Program's Operational Linescan System (DMSP/OLS). Beyond satellites, drone-based platforms have also been deployed to push the temporal granularity down to hourly timescales for short deployments [106].
With the UO platform described above, lighting variability in cities can be observed via broadband visible imaging at significantly higher temporal and spatial resolution while incorporating persistent observations for longitudinal studies, complementing overhead platforms that are capable of broader spatial coverage. In particular, with observations at 0.1 Hz (10 s between images) and spatial resolutions ∼1 m, in [75] we showed that aggregated lighting variability in NYC displays a diurnal pattern of use (on/off behavior) that roughly repeats night-to-night. This variability encodes occupancy characteristics that serve as strong correlates to total energy consumption [107], and in work that it in progress, we find that using these imaging data as an input to a convolutional neural network (CNN) trained on ground truth consumption data results in a 10-15% improvement on networklevel electricity consumption prediction models trained on temperature and humidity data alone. These remote proxies for energy consumption at granular spatial scales can provide important operational inputs in near-real time such as building-level outage detection at ∼minute timescales, and they can also serve as important supplements to recordsbased methods of assessment of energy use on granular spatial scales via up-sampling (down-scaling) of spatially coarse data [108,109].

Lighting Technologies and End-Use
Beyond total consumption, estimates derived from correlations with integrated nighttime luminosity, the determination of lighting technology that is in use in urban environments via the different luminosity signatures in multiple spectral channels is an active area of study [110][111][112]. Lighting technology identification has several important energy uses including identification of targets for energy efficient lighting upgrades by practitioners, empirical determination of end-use characteristics by consumers [113], and measuring rates of adoption of modern lighting technologies by developing cities [114].
The VNIR hyperspectral instrumentation described in Section 2.3 has sufficient spectral resolution and photometric sensitivity to identify characteristic emission features in nighttime lighting [115,116] at a distance of several kilometers. In [76,94] we showed that the VNIR HSI data obtained with the UO can be used to determine the lighting type of individual sources (with spatial resolution ∼1 m) via comparison to spectra measured in the lab and developed Template Activated Partition (TAP) clustering to identify lighting types that were not previously cataloged. An example of the technique applied to recently acquired imaging is shown in Figure 4. Subsequently, other targeted, single-night observations have similarly quantified the lighting profiles of urban environments using both multi-/hyperspectral data [117][118][119] as well as low [120,121] and high [122] frequency variability. By comparing the bulb type of individual sources over several years with persistent UO VNIR imaging, a longitudinal study demonstrating the effects of LED changeover is in progress.

Grid Stability and Phase
At frequencies of 60 Hz (in the US), the fluctuation of the AC mains voltage is reflected in a sinusoidal oscillation of flux in the lights connected to the mains for a subset of lighting technologies, e.g., incandescent, traditionally-ballasted halogen, fluorescent, etc. In [95], we showed that observing city lights at high temporal frequencies allows us to monitor the grid frequency at city scale. However, since persistent observations at frequencies of 100s of Hz is not feasible (e.g., due to data transfer and storage limitations), in [95] we used a liquid crystal shutter operating at ∼119.75 Hz to generate an ∼0.25 Hz beat frequency that is observable with cadences of ∼several Hz. As we described, this capability allows for monitoring of the stability of the grid across an urban distribution network, detection of relative phase of sources, and (with sufficiently accurate observations) potentially also transient phase shifts due to load at the unit level.
Further work by [96,[122][123][124] showed that an alternative to the oscillating shutter technique for creating an observable beat frequency is the serendipitous use of a camera's frame rate operating near a harmonic of the 60 Hz frequency. This type of grid monitoring via remote imaging is complimentary to more standard, modern approaches of large scale in situ phasor measurement unit (PMU) deployments [125,126], and planned future deployments of UO sites in developing cities will enable monitoring of the grid when such in situ devices are unavailable or deployment is unfeasible.

Building Thermography at Scale
Thermographic studies of building envelopes is an important diagnostic tool for assessment of individual building energy and thermal efficiency [127,128], materials [129], and potential defects [130], as well as larger thermal couplings between urban built structures that can lead to aggregate phenomena such as the urban heat island effect [131][132][133].
With the infrared imaging capabilities described in Section 2.2, we can combine the thermographic studies of individual building facades [127] with the coverage and persistence of the UO platform to generate unique quantifications of patterns of built infrastructure use in cities.
In Figure 5 we demonstrate the use of time-dependent broadband infrared imaging to study thermographic envelopes of large numbers of buildings in the city skyline. In par-ticular, not only are efficiency characteristics such as heat leaks and thermal couplings detectable [134,135], but the figure also shows that individual HVAC vent duty cycles can be seen at a distance as well. In ongoing and future work we are applying signal processing techniques (e.g., changepoint detection and/or Hidden Markov Models) on these infrared "sources" to extract on and off transitions much like we have done in the broadband visible imaging case [75]. As with the broadband visible wavelength imaging, this type of source variability can serve as an input to energy consumption models trained on consumption data. In addition, Building Management System operations and heating/cooling efficiency can be measured at scale across thousands of buildings from a single IR sensor.

Environment
Energy use in cities (and urban metabolism more broadly) generates byproducts of that use that have significant environmental impact. These impacts have local effects (e.g., degraded air quality leading to breathing-related health outcomes [136][137][138][139][140]), regional effects (e.g., fine-particle air pollution of regions surrounding cities [141]), and-due to the physical and population size of large cities-global effects (e.g., greenhouse emissions and reduced biodiversity [142,143]). The UO instrumentation extends traditional remote sensing of environment by satellites to increased spatial and temporal resolution (albeit with decreased geospatial coverage) to allow for the study of dynamical detection of environmental impacts of cities on sub-minute timescales.

Soot Plumes and Steam Venting
As buildings in urban environments burn oil for heat, they produce soot plumes that are dispersed through the ambient air as they are advected away from the source by local winds [144]. Such plumes are responsible for ∼75% of greenhouse gas production in NYC [145] and can potentially have significant impacts on regional and global climate [146,147]. While, to date, most remote sensing studies focus on the aggregate effects of many plumes, Figure 6 shows an example of the use of UO visible wavelength observations at cadences of 0.1 Hz to directly detect individual plumes produced by buildings. In the raw imaging, the dark, very low surface brightness of the plume makes it extremely difficult to detect directly from these raw data. However, foreground/background separation techniques [148,149] reveal the plume clearly towards the center of the image. In subsequent images, the plume is blown to the right as it disperses. In addition to soot plumes, the venting of steam from building heating and cooling systems is also visible.
The tracking of plumes has significant potential for not only monitoring the total number of plumes produced and the resulting effects on local air quality, but their motion can also be used as tracers of urban winds, informing studies of air flows through complex urban terrains (including simulations of such effects; e.g., [150]). The plume in Figure 6 is a particularly striking example, however most plumes are quite difficult to detect through classical computer vision techniques due to the complex urban background, time-dependent shadowing, low source brightness, and the amorphous nature of the plumes. We have recently found that applications of regional convolutional neural networks (R-CNNs) [151][152][153] can be tuned to detect such plumes [97] and we have developed end-to-end tracking systems for application to these types of data [154]. By fusing the imaging data with data on fuel types for each building through the image segmentation methods described in Section 2.5, future work will focus on deriving the patterns of soot plume production on time scales of minutes and spatial scales of 10s of meters. Figure 6. Top panels: Two broadband visible daytime images separated by one minute. Bottom panels: the same two images, but with the time-independent background removed. The application of foreground/background separation techniques clearly reveals a soot plume (circled in red) that has been ejected from one of the buildings in the scene.

Remote Speciation of Pollution Plumes
Although fuel type for a given building provides an estimate for the chemical contents of a given plume, broadband visible wavelength observations do not provide direct speciation of plume contents. However, a variety of molecular compounds have strong absorption and emission features in the 7.5-13.5 micron range of the LWIR instrument described in Section 2.4. Over the course of a ∼10 day observational campaign, we showed in [80] that numerous molecular compounds, including Ammonia, Methane, Acetone, Freon-22, CO 2 , etc., could be identified in plumes emitted from buildings along the NYC skyline.
In Figure 7, we show a simple detection of an Ammonia plume using a Principle Component Analysis (PCA) decomposition of a single data cube produced by the LWIR instrument. The various PCA components capture blackbody radiation, atmospheric (including water vapor) effects, and instrumental artifacts. The PCA model for each pixel, when subtracted from the raw data reveals a spatially localized deficit (i.e., a plume) in the 10.35 micron residual that is due to the absorption line of Ammonia at that wavelength. This application of image and signal processing techniques to data from UO deployed LWIR instrumentation has significant applications [155] for both environmental studies of cities as well as emergency management and tracking of toxic materials release.

Urban Vegetative Health
Remote monitoring of the health of vegetation by overhead platforms has had a strong focus on the use of multi-and hyperspectral observations to determine reflectance spectra of plants [156][157][158], features of which relate to their chlorophyll content and photosynthetic properties. In particular, combinations and ratios of reflectance values at certain wavelengths can provide strong indicators of the health of vegetation with the "red edge" [156] location at ∼700 nm, the associated Normalized Difference Vegetation Index (NDVI) [159,160], and the Photochemical Reflectance Index (PRI, the normalized difference between 570 and 530 nm) [161,162] being among the most common indicators of photosynthetic efficiency and vegetative health.
Vegetation in urban environments is under unique stress given the density of built structures and the complexities associated with maintenance of urban forestry. In Figure 8, we show that the UO's daytime VNIR observations have sufficient sensitivity to directly measure chlorophyll features in the reflectance spectra of urban vegetation [163] including the red edge, NDVI, and PRI. Moreover, because of the extremely high spectral resolution and sensitivity, as well as the persistent nature of the UO's observational strategy, new metrics on vegetative health can be developed, low signal-to-noise effects such as solarinduced fluorescence [164] can be measured, and short timescale response to local air quality (e.g., particular matter or ozone [165][166][167]) can be determined to high precision by fusing (see Section 2.5) data from in situ air quality sensors with UO imagery. This has the potential to not only inform urban science studies of the impact of human use of built structures on the natural environment (see Section 4 below), but can provide operational capacity by generating advanced warning indicators of low-level plant stress [165,168].

Ecological Impacts of Light Pollution
It is well known that city lighting has detrimental impacts on local ecologies including migratory avian behavior [169] as analysis of regional radar data indicates that birds in transit are drawn to the glow of city lights. These effects are both regional [170] and highly local (i.e., individual light installations [171]), with both short timescale and longer seasonal effects [172,173]. Recent work by [174] expanded light exposure estimates for migrating birds in flight to continental spatial scales and with longitudinal baselines of ∼20 years. To date, analyses have focused on aggregated lighting from a given city as a whole (e.g., appropriate for birds flying at high altitudes [175]) or on very bright, point sources of light beamed towards the sky [171], that may trap birds during stopover events [176].
Imaging by the UO has the potential to bridge the gap between these spatial scales, providing sufficient spatial, temporal, and spectral resolution to quantify correlations at neighborhood (∼1 km-sq) scale in cities. In collaboration with the New York City Audubon bird conservancy, we have deployed visible wavelength cameras acquiring images at 0.1 Hz to detect time-dependent lighting in lower Manhattan. In work in progress, we are combining this data with regional NEXRAD radar scans [177] to measure the ecological impacts from urban lighting on migratory bird densities at scales of ∼100s of meters and at time scales of minutes.

Human Factors
Urban functioning is fundamentally driven by human decision making. Infrastructure use, transportation choices and their management, economic transactions, etc., all have, at their core, a human element that drives observable patterns of activity. This microbehavior aggregates to macro-scale patterns of life with diurnal, weekly, monthly, and annual rhythms that can be detected by deployed sensing platforms like the UO.

Patterns of Lighting Activity and Circadian Phase
In [75], we showed that aggregate lighting activity in NYC displays clear diurnal patterns that repeat day after day and week after week. These patterns (on/off transitions of individual light sources) differ for residential versus commercial buildings and-as noted above-can serve as proxies for occupancy characteristics of buildings. In addition, these patterns for residential buildings correlate with the circadian behavior of the population as shown in Figure 9 [178]. Given that exposure to artificial lighting (and in particular blue wavelength light) during evening hours can result in melatonin suppression and circadian phase delay [179,180], UO HSI observations at high spectral resolution, combined with aggregate usage duration from our broadband visible wavelength imaging, can bring the techniques of proximal remote sensing to bear on the study of impacts of nocturnal light exposure on human circadian phase by enabling an empirical measurement of variations of these patterns with ambient lighting intensity to quantify the effects of light pollution on public health [73,181,182].
Interestingly, in [75] we also showed that, while the aggregate patterns of light usage in cities were strictly repeating-albeit with different behavior on weekends versus weekdays-a given source does not strictly repeat from one day to the next (nor from one day to one week later). This type of UO data directly address the micro/macro behavioral properties of the population [183,184] and the scale of the transition between the two.

Technology Adoption and Rebound
Technological choice, and in particular the correlation of choice with socio-economic and demographic characteristics of the population, is an important indicator of population dynamics. In the energy sector, choice is an end-use characteristic studied most commonly by surveys of users [185], however there have been recent works focused specifically on using techniques from remote sensing to estimate lighting (and in particular LED) choice [186].
By combining the lighting patterns described in Section 3.3.1 with the lighting technology identification described in Section 3.1.2 over time, our UO platform is ideally suited to not only quantify choice as a function of population characteristics via fusion (as described in Section 2.5) of imaging data with census data, but also allows for a direct, empirical measurement of the amplitude of the rebound effect [185,187] in which the benefits of energy efficiency are offset by increased use stemming from a decreased economic incentive of curtailing use.

Discussion
In the previous sections we have outlined the hardware and operational characteristics of a multi-modal UO and demonstrated its potential to inform a range of urban science domains. While there are of course disadvantages of the ground-based platform including reduced spatial coverage relative to satellites for a single deployment and complexity of the scene understanding required to extract relevant image features, the flexibility of the system provides two key benefits relative to other modern observational platforms.

1.
Temporal granularity: the cadence provided by the UO is not currently possible (or practical) for any spaceborne or airborne platform. However the timescales accessible to the UO align with patterns of life that present in other urban data sets (energy consumption, circadian rhythms, heating/cooling, technological choice, vegetative health, aviation migration, etc.) enabling the fusion of these data to inform the timedependent dynamical properties of urban systems.

2.
Oblique observational angles: even low-lying cities have a significant vertical component and purely downward-facing platforms are not able to capture these features. This is particularly important for several of the indicators of lived experience described in Section 3 such as light pollution, the effects of which (e.g., circadian rhythm disruption, sky glow, and impacts on migratory species) are due to light emitted "out" or "down" as opposed to "up", or the variation in heating and cooling properties of multi-floor buildings as a function of height in the building.
This combination of spatiotemporal granularity with a side-facing viewing angle allows for unique quantifications of urban dynamics that are not accessible via other methods.
We have also described how these dynamics relate to urban science domains and lines of inquiry and highlighted numerous use cases that can inform the fundamental science of cities as well as the practical operation of urban environments. All of the urban science domain studies presented in Section 3 are associated with the three fundamental components of urban systems: the human, built, and natural environments. In fact, one of the conceptual drivers behind the initial creation of the UO in NYC was the link between these three components and how observable temporal patterns generated by one component result in observable temporal patterns in another component. This is closely connected to the field of socio-environmental systems including the concept of "urban metabolism" within the fields of Urban Ecology, Industrial Ecology, and Urban Political Ecology [74]. Within Urban Ecology in particular, urban metabolism represents the interactions between subsystems in urban environments [188] that link the three fundamental components of cities [189][190][191], and it is precisely these interactions (on multiple temporal and spatial scales) that lead to dynamical variability in UO imaging data.
In fact, by way of example, we can quantitatively tie together three of the urban science cases in Section 3 within that framework to demonstrate the holistic methodology enabled by UO observations. Human activity and decision making in buildings leads to energy use through heating and cooling, the cadence of which can be identified via change-point detection methods or other state-based time series analysis approaches applied to UO observations in the IR (Section 3.1.4). That energy produces byproducts in the form of a pattern of recurring pollution plumes (Section 3.2.1) that can be observed and tracked via neural networks or other object detection techniques from computer vision applied to UO broadband visible observations or extracted from spectra in our HSI observations. The natural environment effects of those pollution plumes on vegetation, for example, can then be assessed by remotely monitoring vegetative health via HSI scans (Section 3.2.3). In each of these observational cases there are patterns at minute, hourly, diurnal, weekly, seasonal, and annual time scales and deriving the relationship (relative phase, temporal lag, transfer functions, etc.) between those patterns in each case will inform studies of the interactions in urban socio-environmental systems and will be the subject of future work. While there is still significant understanding in each of the urban science domains that is required to fully explore the dynamical evolution of each subsystem, the novel UO methodology presented here represents an observational platform for tying together the individual threads of the science of cities.

Conclusions
Modern cities are systems with tremendous complexity, and the functioning and behavioral properties of those systems have significant implications for their current and future environmental impacts as well as the quality of life of the inhabitants. It is through a detailed analysis of the three fundamental components of urban systems (the human, built, and natural environments) that one can uncover the temporal dynamics that govern urban behavior. Measuring those dynamical interactions of urban systems requires high spatial and temporal resolution, with sufficient coverage to generate a representative sample of that system as a whole. We have presented an observational platform for the collection of data that can provide inputs to machine learning, computer vision, image processing, and astronomical analysis techniques that extract information relevant to the functioning of cities. Our first realization of that platform in New York City is the creation of the Urban Observatory facility, consisting of imaging systems (both broadband and hyperspectral) sensitive to visible and infrared wavelengths, with an operational mode that is persistent, synoptic, and granular. The urban science and domain topics that this data can address are broad ranging from energy use and its environmental impacts to patterns of life and public health. As the technology develops, deployment of similar Urban Observatories to cities of various sizes, local environments, and localities will enable a comprehensive and rich comparative study of diverse cities, deepening our core understanding of complex urban systems.

Acknowledgments:
We thank the New York City Audubon for their partnership and collaboration and Ryan Holifield, Woonsup Choi, Arijit Sen, Dawn Erb, and David Kaplan for conceptual conversations surrounding the Urban Observatory's ability to measure urban metabolism.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Backend Infrastructure
The deployed instruments and their associated operational modes described in Section 2 require a flexible and robust backend computing infrastructure to collect, store, and analyze the large volume of data generated by the UO. This infrastructure consists of both hardware and software components that, taken together, operate continuously to yield persistent observations. Our backend infrastructure consists of the following core components with associated functionality (a full end-to-end illustration of the UO's operational methodology is shown in Figure A1). Figure A1. An end-to-end illustration of the Urban Observatory's operational methodology. Remotely deployed instrumentation collects raw imaging data from visible to infrared wavelengths in both broadband and hypserspectral modalities that are transferred and permanently stored in an imaging database. The images are processed through an image processing pipeline that extracts source features (brightness in the visible or infrared, spectrum, chemical species, lighting type, etc.) that are themselves stored in a parallel source database. The images, source characteristics, and external data (building characteristics, aggregate socio-economic and demographic characteristics, in situ sensor data, etc.) are fused via geospatial projection of each pixel (see Section 2.5) and a variety of machine learning and computational statistical methods are applied to inform the variety of urban science domain studies described in Section 3 and summarized in Table 1.
Camera control devices-Each imaging device is equipped with a mini-computer that opens a direct link with the camera itself. This machine is tasked with communicating directly with the camera, lens, and other peripherals and issuing image acquisition commands. In certain instances, this computer can also be used to perform edge computations including compression or sub-sampling of the data. Acquired data may be saved temporarily on disk on this machine for buffered transfer over an encrypted session back through the gateway server, or be written directly to bulk data storage.
Gateway server-The main communications hub between our computing platform and the deployed instrumentation is a gateway server that works on a pub sub model, issuing scheduled commands to the edge mini computers. This hub is also responsible for the pull (from the deployment) and push (to the bulk data storage) functionality for the data acquisition as well as the firewalled gateway for remote connections of UO users to interact with the databases in our computing platform. Bulk data storage-At full operational capacity, a UO site (consisting of a broadband visible camera operating at 0.1 Hz, broadband infrared camera operating at 0.1 Hz, a DSLR operating in video mode, and a VNIR hyperspectral camera operating at 10 −3 Hz) acquires roughly 2-3 TB per day. This data rate necessitates not only careful data buffering and transfer protocols to minimize packet loss from the remote devices, but also a large bulk data storage with an appropriate catalog for the imaging data. This ∼PB-scale storage server is connected to our computing servers using NFS (Network File System) protocols for computational speed. This storage server also hosts parallel source catalogs that store information extracted from the data. Computing server-Our main computing cluster that is used to process UO data consists of a dedicated >100 core machine that is primarily tasked with processing pipelines including: registration, image correction, source extraction, etc. We have designed our own custom platform as a service interface that seamlessly allows UO users to interact with the data while background data processing and cataloging tasks operate continuously. GPU mini-cluster-Several of the data processing tasks described in Section 3 require the building and training of machine learning models with large numbers of parameters including convolutional neural networks. For these tasks, we use a GPU mini-cluster that is directly connected to our main computing server and which is continuously fed streaming input data from which objects and object features are extracted.