Linking the Urban Environment and Health: An Innovative Methodology for Measuring Individual-Level Environmental Exposures

Environmental exposures (EE) are increasingly recognised as important determinants of health and well-being. Understanding the influences of EE on health is critical for effective policymaking, but better-quality spatial data is needed. This article outlines the theoretical and technical foundations used for the construction of individual-level environmental exposure measurements for the population of a northern English city, Bradford. The work supports ‘Connected Bradford’, an entire population database linking health, education, social care, environmental and other local government data over a period of forty years. We argue that our current understanding of environmental effects on health outcomes is limited both by methodological shortcomings in the quantification of the environment and by a lack of consistency in the measurement of built environment features. To address these shortcomings, we measure the environmental exposure for a series of different domains including air quality, greenspace and greenness, public transport, walkability, traffic, buildings and the built form, street centrality, land-use intensity, and food environments as well as indoor dwelling qualities. We utilise general practitioners’ historical patient information to identify the precise geolocation and duration of a person’s residence. We model a person’s local neighbourhood, and the probable routes to key urban functions aggregated across the city. We outline the specific geospatial procedure used to quantify the environmental exposure for each domain and use the example of exposure to fast-food outlets to illustrate the methodological challenges in the creation of city and nationwide environmental exposure databases. The proposed EE measures will enable critical research into the relationship and causal links between the built environment and health, informing planning and policy-making.


Introduction
The influential role of the built environment in shaping our health and well-being is increasingly being recognised [1]. Growing evidence suggests strong relationships between exposures to environmental characteristics such as air pollution [2][3][4], noise [5][6][7], green space [8][9][10] and greenness [11][12][13], public transport [14,15], walkability and street centrality [16][17][18], unhealthy food [19][20][21], or indoor dwelling qualities [22,23] and health outcomes across the globe. While the detrimental effects of environmental exposures have been broadly demonstrated in numerous cases, there is also growing evidence for positive effects of built environment features. Individual features of the built environment have been linked to increased physical activity [24], active travel [25], or lower levels of obesity [26]. For example, neighbourhoods that are walkable are associated with increased physical activity and other positive health outcomes [27,28]. Moreover, the built form and shape of green spaces have an impact on the number of walking trips fostering physical activity [29]. There are also occasions where the built environment features can be both beneficial and detrimental, such as the proximity to busy roads being supportive for active travel, but bad through their potentially high level of air pollution. Having a greater precision for environmental indicators would enable a more nuanced analysis of the relationship between the built environment and health than has been possible to date.
A major limitation of the existing ways of constructing large-scale spatial indicators is that they focus on aggregate levels such as administrative boundaries or postcodes, and do not take into account individual-level exposure [30]. Increasing evidence shows that exposures perceived at the street level (e.g., air pollutants) can have particularly detrimental effects on health [31,32]. Such negative effects can occur even at low levels (i.e., short-term and low concentrations) of environmental exposure [33]. Exposures modelled at national scales have been found to significantly underestimate the real-world exposure perceived at the street level [34]. Evidence from small-scale studies indicates that characteristics of the built environment, such as higher densities of buildings forming street canyons, may lead to increased levels of exposures [35]. This highlights the need for better quality spatial data at smaller scales to differentiate between conditions at the finest, street-level scale, since quantifying environmental exposures at the scale at which they are experienced by individuals (rather than in arbitrarily designated geographies) can provide crucial information for improving the knowledge on the impact of environmental exposures on health and well-being. Such knowledge will inform spatial planning decision-making, e.g., in guidance for targeted modifications of the built environment that accounts for its complexity.
Agreement over the importance of individual features had been varied in the past. A recent meta-narrative review by Ortegon-Sanchez et al. [36] on the relationships between the built environment and child health, identified vast inconsistencies in the way in which neighbourhood characteristics are measured and conceptualised within health studies. These inconsistencies might explain the existing contrasting evidence for specific environmental features and health outcomes, such as between quantifications of food environments and obesity [37], or between urbanicity and schizophrenia [38]. In the latter, advances are being made in improved precision of measuring built environment features, and it is to this literature that we hope our work will contribute [39].
Finding relationships between specific built environment features and health outcomes opens the possibility for targeted planning and policy interventions. Yet, it is imperative for such interventions that decision-makers understand the precise built environment features involved to avoid adverse negative effects of policies to improve health outcomes [40]. Modifying the built environment effectively requires an intrinsic knowledge of the importance of individual features, as well as their interrelated effect on health. However, causal links between some built environment features and health outcomes are still to be proven definitively, despite many attempts to do so [41].
There are significant differences and heterogeneity between the physical and environmental characteristics of neighbourhoods of a similar population density within cities (e.g., dispersed high-rise and dense low-rise buildings). A large part of the field of urban planning is dedicated to the quantification of such differences measuring the geographical, form and functional properties of the built environment and their relationships [42,43]. Such nuanced and detailed measures of the built environment have not yet been systematically incorporated into the research on public health. We extend the argument by Cyril et al. [44] of 'an urgent need for health studies to standardise measures of urbanicity', to 'the urgent need for comprehensive and detailed standardised longitudinal measures of the built environment' for health and public health studies. Challenges for health research in understanding the importance of environmental features not only arise through the difficulties in the effective quantification of these but by their complex intersection with additional social, cultural or economic factors.
Existing and emerging large-scale linked population datasets which track health over time provide rich individual-level information on health and socio-economic factors. Combining such datasets with high-resolution geospatial environmental information of exposures, including the length and extent of exposure, holds the potential to improve the value of existing population datasets for the discovery of causal relationships between built environment characteristics and health outcomes. Geospatially-enriched population data also enables investigations into the interaction effects between different built environment characteristics on health.
We aimed to construct a detailed set of built environment indicators for every address in Bradford, accounting for the limitations previously identified. This paper describes their development, highlights key challenges, and outlines potential pathways to accelerate discoveries. We hypothesise that by linking meaningful built environment features to longitudinal whole population databases, researchers will achieve (a) an improvement in the methods for correlating the built environment with health and social outcomes; (b) will support the production of new knowledge on how the built environment contributes to diseases; and (c) in linking to cohort data, to improve the understanding of how and when the built environment shapes long-term health outcomes that are the result of growing up in unhealthy environments.
We quantify a series of environmental exposures at an individual level across 11 different domains and link these to individual-level health data to investigate relationships at a high level of detail. Environmental information is gathered for a whole population and we outline how these can be linked to a longitudinal birth cohort in Bradford, England. Our exposure indicators capture qualities across air pollution, greenspace and greenness, public transport, walkability, traffic, buildings and the built form, street centrality, land use, and food environments as well as indoor dwelling quality domains. We demonstrate our methods on data around the year 2020 and outline how the proposed methods hold the potential to be created longitudinally for any year between 2007 and 2022, and in future years, if data are available.
Section 2 presents our theoretical approach, the methodologies of increased spatial precision and the standardised conventions in quantifying the built environment, Section 3 provides critical indicators to inform the existing contrasting pieces of evidence within health research, and Section 4 discusses challenges in creating city and nationwide datasets, and outlines potential pathways to accelerate discoveries in public health through the application of large-scale environmental exposure and linked population datasets.

Materials and Methods
Increasing evidence points to the importance of individual built environment characteristics on health and human behaviour, especially those that can be linked to encouraging physical activities [45,46]. In line with this recognition, we measured exposure through the perspective of an individual perceiving the built environment as they traversed and used it. We utilised a person's address as the starting point to measure the likeliness of a person to interact with their immediate surroundings, and for simulated journeys through the streets, the neighbourhood and to urban features. We used this lived environment to quantify the exposures and to aggregate these at the address, which also formed a point of linkage to the individual-level health information from historical data held in a whole population database and sourced from general practitioners (i.e., GPs, namely, family doctors) in the city of Bradford.
In doing so, our approach differed from the existing large-scale environmental exposure indicator datasets that are based on arbitrary geographic boundaries [30], by capturing the granular differences of the built environment at the level they are experienced. This is an important distinction as, e.g., postcodes and administrative boundaries are generally affected by a modifiable area unit problem [47], and their centroids can introduce unobserved biases into the precision of computed proximities. For example, our analysis has shown that the location of address points within the rural postcodes in Bradford can be hence, the tracking of routine health outcomes such as a patient's weight or body mass index (BMI), the frequency of their health and mental health service use, the prescription of specific drugs (e.g., asthma medication), or school attendance, among many others (see [56] for a comprehensive list and description). benefits data from local authorities, as well as crime data from West Yorkshire police. This includes, e.g., variables on a patient's medication, their clinical history, or emergency care usage. Connected Bradford allows, hence, the tracking of routine health outcomes such as a patient's weight or body mass index (BMI), the frequency of their health and mental health service use, the prescription of specific drugs (e.g., asthma medication), or school attendance, among many others (see [56] for a comprehensive list and description).

Data Linkage
The data linkage between our EIs and the Connected Bradford database was achieved through a five-step process, i.e., (1) we derived environmental EIs for each residential Unique Property Reference Number (UPRN) in Bradford; (2) we received person identifiable information (i.e., historic address information) including pseudonymised NHS numbers from the Bradford Teaching Hospitals NHS Foundation Trust; (3) we georeferenced this historic address information and linked it to the UPRNs (as outlined below) and included EI data for each; (4) we pseudonymised the UPRNs and removed all the identifiable information; and (5) we provided Connected Bradford with a dataset of pseudonymised NHS numbers, pseudonymised UPRNs and linked EI data.
For step 3, we conducted large-scale georeferencing of the available patient address information from all the participating GPs within Bradford. The data comprised address information from 1950-2022 including more than 22,500,000 address rows for 1,110,000 unique pseudonymized NHS numbers. We used this address information in conjunction with Ordnance Survey (OS) AddressBase Premium data (which provides up-to-date accurate information about addresses and properties in the UK) to match a patient address record to its respective UPRN. UPRNs are unique identifiers for every addressable location in the UK. We matched the UPRNs to each patient address by adapting the opensource address-matching Oracle and R package addressMatchR. The code was rewritten to run solely in the open-source programming language R and further adaptations were made to tailor the approach to the particularities of GP address information, requiring an extended simplification and cleaning of the input data (an open-source version of the code can be obtained via the GitHub repository https://github.com/kimonkrenz/CBHE (accessed on 14 December 2022)). The result was a historical address record for every patient, linked to their respective NHS number within Connected Bradford. This enabled the linkage of environmental information to a patient and their health and other wider determinants of the health data captured in the Connected Bradford database via UPRNs. Sohal

Data Linkage
The data linkage between our EIs and the Connected Bradford database was achieved through a five-step process, i.e., (1) we derived environmental EIs for each residential Unique Property Reference Number (UPRN) in Bradford; (2) we received person identifiable information (i.e., historic address information) including pseudonymised NHS numbers from the Bradford Teaching Hospitals NHS Foundation Trust; (3) we georeferenced this historic address information and linked it to the UPRNs (as outlined below) and included EI data for each; (4) we pseudonymised the UPRNs and removed all the identifiable information; and (5) we provided Connected Bradford with a dataset of pseudonymised NHS numbers, pseudonymised UPRNs and linked EI data.
For step 3, we conducted large-scale georeferencing of the available patient address information from all the participating GPs within Bradford. The data comprised address information from 1950-2022 including more than 22,500,000 address rows for 1,110,000 unique pseudonymized NHS numbers. We used this address information in conjunction with Ordnance Survey (OS) AddressBase Premium data (which provides upto-date accurate information about addresses and properties in the UK) to match a patient address record to its respective UPRN. UPRNs are unique identifiers for every addressable location in the UK. We matched the UPRNs to each patient address by adapting the opensource address-matching Oracle and R package addressMatchR. The code was rewritten to run solely in the open-source programming language R and further adaptations were made to tailor the approach to the particularities of GP address information, requiring an extended simplification and cleaning of the input data (an open-source version of the code can be obtained via the GitHub repository https://github.com/kimonkrenz/CBHE (accessed on 14 December 2022)). The result was a historical address record for every patient, linked to their respective NHS number within Connected Bradford. This enabled the linkage of environmental information to a patient and their health and other wider determinants of the health data captured in the Connected Bradford database via UPRNs. Sohal et al. [56] provide details of the granted ethical approval (IRAS ref: 239924, CAG ref: 18/CAG/0091 and REC ref: 18/YH/0200) and the implemented mechanism to prevent the intentional or unintentional re-identification of individuals within this dataset.
In parallel, we used the geographic information and address classification available within the OS AddressBase Premium dataset to identify all residential addresses, their UPRN and their location in Bradford. EIs could, thus, be generated for every residential address in Bradford and subsequently linked without the need to access patient information in the process. Our methodology also allowed the computation of EIs for alternative address information or UPRNs, such as schools, libraries or workplaces, given these are linked to a person.

Built Environmental Data Sources
The base for our analyses comprised high-resolution vector-based geospatial data from OS (i.e., OS MasterMap Topography, OS MasterMap Highways Roads and Path, OS AddressBase and OS Points of Interest (POI) data). The OS datasets provided the most detailed, comprehensive and up-to-date view of Great Britain's landscape, the built environment and the land-uses. This high-resolution vector-based OS information has been available in a comparable and complete format since 2007. We used this geospatial information from 2021 to construct a street network model through which we measured the proximities to environmental exposures. This network comprises all the publicly accessible roads, and also includes all the footpaths through towns and cities to comprehensively capture how and where a person might walk. The network is dissected into 20 m long segments, of which each start and end node can constitute the beginning or end of journeys to and from urban features. The environmental exposures were based on either primary data collection, modelled information, or where appropriate, secondary data. This included vector-based information on the location of urban features (e.g., POI, National Public Transport Access Nodes (NaPTAN)), entrance/access points (e.g., OS Open Greenspace), the parts of streets, pathways and their properties (OS MasterMap Highways), 3D building information (OS MasterMap), and image-based satellite data (Landsat 8-9), as well as additional data sources capturing the indoor (e.g., energy performance certificates (EPC)) and outdoor qualities (e.g., local authority traffic and air quality data).

Construction of Exposure Variables
Utilising the Geographic Information System, we derived a series of measures quantifying exposures to and within the built environment. For this, we defined nine different types of spatial relationships through and at which an individual can be exposed to the environment and features of the built environment ( Figure 2). This approach took account of the most basic forms through which humans interact with the environment; both static (e.g., at the place of residence) and dynamic (e.g., while walking along a route). Therefore, these nine spatial relationships can be divided into exposure at specific locations, i.e., (a) at and within the residential home, (b) at the residential street next to the home, (c) at the urban block (i.e., the area enclosed by streets and paths) containing the residential home, (g) at a circular area surrounding the residential home, as well as exposures at places of potential interaction, i.e., (h) along a route to an urban feature or (d) to the closest urban feature from the residential home (e.g., parks entrances), (e) along a route to all, or the average distance to all the urban features of a specific type, (f) to urban features within a catchment area, and (i) to properties of routes within a catchment area. We used an open-source PostgreSQL object-relational database in combination with PostGIS, a spatial database extender, for the construction of a geospatial database and the calculation of the environmental exposure variables. PostGIS allows for fast and large-scale spatial-based queries enabling a simultaneous computation of the spatial metrics for an entire city and potentially an entire country. In addition to the outlined spatial relationships, we introduced four different distance types used in our analysis, i.e., (1) the metric distance as the crow flies (ignoring particularities of the spatial configuration), (2) the metric distance through the street network (representing how a human travels through and perceives the environment), (3) the angular distance through the street network (representing how a human navigates the environment), and (4) the distance decay metric distance through the street network (incorporating the effect of distance into the likelihood of an interaction between a human and the environment). Distance decay functions are standard methods in pedestrian accessibility modelling within the field of urban and transport planning. The core aim of these approaches is to incorporate the effect of distance in the analysis, i.e., the decreasing importance of an urban feature to a person with an increasing distance from it and to overcome the limitations of choosing otherwise necessary distance cut-off points. Additionally, distance decay functions provide a method to continuously decrease values until converging to zero, rather than an abrupt cut-off as in the buffer or catchment area approaches. While a plethora of different distance decay functions has been proposed, there is little statistical difference between these [57]. For the environmental indicators that use the distance decay metric distance (D), we have selected the following exponential distance decay function: Figure 2. Different methods quantifying environmental exposure at varying spatial relationships: at (a) an address; (b) an address street; (c) an urban block containing the address; from an address to: (d) an environmental feature; (e) all environmental features; (f) environmental features within a catchment area; a buffer around (g) an address; (h) a route from an address to an urban feature; (i) a catchment area from an address.
In addition to the outlined spatial relationships, we introduced four different distance types used in our analysis, i.e., (1) the metric distance as the crow flies (ignoring particularities of the spatial configuration), (2) the metric distance through the street network (representing how a human travels through and perceives the environment), (3) the angular distance through the street network (representing how a human navigates the environment), and (4) the distance decay metric distance through the street network (incorporating the effect of distance into the likelihood of an interaction between a human and the environment). Distance decay functions are standard methods in pedestrian accessibility modelling within the field of urban and transport planning. The core aim of these approaches is to incorporate the effect of distance in the analysis, i.e., the decreasing importance of an urban feature to a person with an increasing distance from it and to overcome the limitations of choosing otherwise necessary distance cut-off points. Additionally, distance decay functions provide a method to continuously decrease values until converging to zero, rather than an abrupt cut-off as in the buffer or catchment area approaches. While a plethora of different distance decay functions has been proposed, there is little statistical difference between these [57]. For the environmental indicators that use the distance decay metric distance (D), we have selected the following exponential distance decay function: where d is the distance in meters and k is a decay parameter. We applied this function to distances between addresses and urban features at varying decay parameters ( Figure 3) to account for the potential differences in user groups (e.g., children, families with prams, and the elderly).
where d is the distance in meters and k is a decay parameter. We applied this function to distances between addresses and urban features at varying decay parameters ( Figure 3) to account for the potential differences in user groups (e.g., children, families with prams, and the elderly).

Selection of Environmental Domains
We selected a series of 11 environmental domains for which we generated exposure indicators. This selection was based on the existing evidence pointing to potential associations between each of these built environment domains and health outcomes, and it was informed by the policy priorities of the City of Bradford. Where possible, we sought to refine the existing measurements in order to build on the existing research in this domain and to offer a methodology that could be adopted relatively straightforwardly. The following section will outline the reasoning for each domain, their data source and the specificities for their calculation. Unless further specified, see Section 3 for details on the various spatial relationships and scales used for each domain.

Air Quality
Air pollution has long been associated with negative health outcomes [2][3][4]. Population-wide analyses into the relationships between exposure to air pollution and health often utilise aggregate information, such as the UK Emissions data from the Department for Environment, Food and Rural Affairs (Defra) which features a 1 × 1 km resolution [58]. Villeneuve and Goldberg [59] highlight this as a common shortcoming in studies and advocate for high-resolution spatial datasets due to the variability of air pollution at small scales. We addressed this need by utilising high-resolution (1 × 1 m) air pollution data from 2018 on the annual average concentration of the particulate matter (PM) of 10 and 2.5, as well as nitrogen oxides (NOx) from the City of Bradford Metropolitan District Council. This dataset was the result of an air quality model (i.e., a Ricardo-AEA Rapid-Air complex dispersion modelling), which estimates the concentration at a 1 × 1 m resolution. The model utilises information gathered from more than 200 automatic and nonautomatic monitoring sites across Bradford's urbanised areas (see [60] for a detailed description of the underlying air quality monitoring data and the location of monitoring sites), in conjunction with local data on industrial sites, vehicular and train traffic,

Selection of Environmental Domains
We selected a series of 11 environmental domains for which we generated exposure indicators. This selection was based on the existing evidence pointing to potential associations between each of these built environment domains and health outcomes, and it was informed by the policy priorities of the City of Bradford. Where possible, we sought to refine the existing measurements in order to build on the existing research in this domain and to offer a methodology that could be adopted relatively straightforwardly. The following section will outline the reasoning for each domain, their data source and the specificities for their calculation. Unless further specified, see Section 3 for details on the various spatial relationships and scales used for each domain.

Air Quality
Air pollution has long been associated with negative health outcomes [2][3][4]. Populationwide analyses into the relationships between exposure to air pollution and health often utilise aggregate information, such as the UK Emissions data from the Department for Environment, Food and Rural Affairs (Defra) which features a 1 × 1 km resolution [58]. Villeneuve and Goldberg [59] highlight this as a common shortcoming in studies and advocate for high-resolution spatial datasets due to the variability of air pollution at small scales. We addressed this need by utilising high-resolution (1 × 1 m) air pollution data from 2018 on the annual average concentration of the particulate matter (PM) of 10 and 2.5, as well as nitrogen oxides (NOx) from the City of Bradford Metropolitan District Council. This dataset was the result of an air quality model (i.e., a Ricardo-AEA Rapid-Air complex dispersion modelling), which estimates the concentration at a 1 × 1 m resolution. The model utilises information gathered from more than 200 automatic and non-automatic monitoring sites across Bradford's urbanised areas (see [60] for a detailed description of the underlying air quality monitoring data and the location of monitoring sites), in conjunction with local data on industrial sites, vehicular and train traffic, background concentrations and domestic heating activities. For rural areas where air pollution is not a concern and small-scale monitoring stations and tubes are scarce, we used the UK emissions data. We note that research designs that are interested in traffic-related pollution might be better placed to use traffic variables.
We calculated the average and maximum values of the PM of 10 and 2.5 and NOx for the buffer and catchment-based buffer areas at varying distances.

Road Traffic
The effects of road traffic, such as noise [61] and air pollution (see Section 2.5.1), can have a series of adverse effects on health and quality of life which is increasingly being recognized by local authorities. Road Traffic is generally measured at the street level through annual manual traffic counts at various locations (e.g., major and minor roads) and aggregated to annual average daily flows (AADF). In the UK, streets that are not covered by manual counts are estimated in street-level estimation models. Such modelled data is often insufficient in capturing the temporary fluctuations during the day common to traffic flows and does not cover sufficient information for small local roads. To overcome this limitation, we utilized the UK-wide Trafficmaster data, which has tracked the GPS information from more than 135,000 vehicles in 1 to 10 s intervals, since 2019. The data is purchased by the Department for Transport and is available to local authorities across the UK. Besides the count data, the data also contains information on the average speeds and free-flow speeds per street segment. We used these to derive a ratio-based congestion variable as follows: where a is the free-flow speed and x is the average speed during the observed period. We calculated the annual average and maximum bidirectional count of vehicles, as well as the maximum and average level of congestion for three time periods (i.e., peak morning (07:00-9:00), off-peak (10:00-16:00) and peak evening (16:00-19:00)) during weekdays. Besides measuring the count and congestion at the address street, we also aggregated the data for a 300 m catchment area around the address.

Greenness and Greenspace
Numerous studies have highlighted associations between green space [8][9][10] or greenness [11][12][13] and health outcomes. The general method for measuring the degree of greenness is the normalised difference vegetation index (NDVI). This pixel-based metric estimates the density of green vegetation within satellite imagery. A high pixel resolution is critical to deriving meaningful small-scale estimations, and previous research [62] has demonstrated that a 30 m resolution provides sufficient detail. The NDVI can be calculated from historic and globally available satellite data, such as the USGS Landsat 8 product (2013-present), as follows: where NIR is the light reflected in the near-infrared spectrum (Landsat 8 Band 5) and RED is the light reflected in the red range of the spectrum (Landsat 8 Band 4). We used cloud-free data from May 2020, the greenest month on record. We calculated the average NDVI within varying radii and walking distances. While the NDVI can capture the general level of greenery in an area (sometimes referred to as availability), it lacks information on the type and accessibility of usable green spaces. For this reason, we calculated the accessibility by computing the distance from an address to the closest entrance points of green spaces. We utilized the OS Open Greenspace dataset, which includes information on entrances, area sizes and the following green space classifications: public parks or gardens, allotments or community growing spaces, cemeteries, play spaces, religious grounds, bowling greens, golf courses, other sports facilities, playing fields, and tennis courts. We followed the recommendations from the Accessible Natural Greenspace Standard for England [63] and the WHO [64], and computed the counts of green spaces within various distances, then combined these with two refined measures: a distance decay weighted count, as well as the distance weighted size. In doing so, we were able to capture the likelihood of interaction with a green space of a certain class by a person's proximity to spaces considering that the interaction grows with size.
We calculated measures for all the green spaces and each class respectively, whereas the individual green space classes could be combined into new variables through an addition. Specifically, we counted the number of green spaces of 2 ha within 300 m, 20 ha within 2000 m, and 100 ha within 5000 m. We then counted the number of green spaces within varying radii, and we calculated the distance decay weighted counts and greenspace area for varying parameters.

Public Transport
The use of public transport has, in various ways, been associated with better health [14,15]. Most studies interested in public transport accessibility quantify the proximity to public transport stops using an as-the-crow-flies distance, walking distance or travel time. We adapted these approaches and calculated the proximity to public transport stops using the 2022 National Public Transport Access Node (NaPTAN) database. The NaPTAN provides historic (1998-present) and nationwide information on the points of access to public transport. The spatial information ranges from a stop's location to the entrance points of larger stations and it includes a classification for each stop/entrance (i.e., bus, coach, metro, rail, and airports).
We calculated the walking distance to public transport access points for each class, including the closest available point. We counted the number of public transport stops within varying radii, and we calculated the distance decay weighted counts for varying parameters.

Walkability and Land-Use Intensity
Providing walkable and diverse neighbourhoods are two core policy priorities for local authorities aiming to deliver on sustainability targets. The degree to which an area is walkable has been linked to increased walking behaviour and better health [16][17][18]. A common way to evaluate the walkability of a street is to utilise street centrality measures (see Section 2.5.6.), as these can form an effective method, particularly with a lack of additional data sources. For this work, we selected a methodology that utilised additional datasets to derive better estimations. Specifically, we used a pedestrian demand model [65] which was based on datasets used in other environmental exposure variables (i.e., OS MasterMap, OS AddressBase Premium, OS Highways, and NaPTAN), of which comparable datasets can be found worldwide. The model was based on a combination of land-use intensity, transport accessibility, street network centrality and residential population density, which resulted in a raster-based geographic data surface at a resolution of 25 × 25 m generated through interpolation. We generated this model for Bradford and used the raster-based output as the base for aggregating the walkability variables from 2018. In addition, we also included the land-use intensity subcomponent as an individual variable. The land-use intensity was based on Shannon's Diversity Index, which is calculated as follows: where H is Shannon's Diversity Index, i is the proportion of one land use area of all the land uses present and p i is the total value of the land use area. We further use H to account for the equitability of the mix. For a detailed description see Dhanani et al. [65]. We calculated the average and maximum walkability and land-use intensity for varying radii.

Street Centrality
Street centrality, or street network centrality, is an established metric to quantify the spatial configuration and urban morphology of an area by computing a relative centrality metric for every street. First introduced by Hillier and Hanson [66,67] in the theory of space syntax and further developed by Turner et al. [68,69], there are two widely used metrics, i.e., angular closeness centrality (or angular integration, the potential of movement to a street segment) and angular betweenness centrality (or angular choice, the potential movement through a street segment). Several studies have used these metrics as proxies for walkability and reported associations between these metrics and health outcomes [16][17][18] and their ability to predict pedestrian movement [70]. As such, the method overcomes the oversimplification issues of alternative metrics that aim to capture the character of an urban area (e.g., 'urbanicity') by counting the number of intersections or measuring the aggregated population density.
The angular closeness centrality calculates the angular distance between every street segment and every other segment in the street network within a given radius, using the shortest angular path. The variable is calculated as follows (see [69] for a comprehensive description): where d jk is the length of the shortest path between node p i and p k . The angular betweenness centrality is calculated by generating the shortest paths between all segments within a given radius as follows: where g jk (p i ) is the number of shortest paths between node p j and p k which contain node p i , and g jk is the number of all the shortest paths between p j and p k . The base for this analysis was the network model using the OS Highways and OS Urban Path information described in Section 2.3. We calculated the relative centrality (i.e., the angular closeness and angular betweenness centrality) for all the street segments in Bradford at varying radii. We measured the centralities at the residential segment and aggregated the average and maximum value within 300 m from a home address point.

Built Form
A main concern of the field of urban planning and urban morphology is the development of methodologies to quantitatively capture differences in the urban form. We selected a prominent approach of these [71,72] and calculated a series of variables that described the spatial characteristic of urban densities and form, and combined these with descriptive information from secondary data (i.e., the Department for Levelling Up, and the Housing and Communities' Energy Performance Certificate database). Specifically, we described built-form characteristics at the building level, as well as the block level. We measured the building footprint, building height, building volume, and building floor area utilising the OS MasterMap and Building Height data in combination with OS Highways information from 2021. This data will enable investigations between urban densities and health outcomes.
We calculated for each address the floor-space index (FSI), ground-space index (GSI), open-space ratio (OSR), and the average building layers of floors (L) for the block (i.e., the continuous area bounded by streets) that contained an address point (see [71] for a comprehensive description). We further included EPC-based information on the build form classification (i.e., detached, semi-detached, end-terrace, mid-terrace, enclosed endterrace, and enclosed mid-terrace), the dwelling type (i.e., house, bungalow, flat, maisonette, and park home), the construction age band (i.e., before 1900, 1900-1929, 1930-1949, etc.), and the number of storeys and tenure type (i.e., rental (private), rental (social), and owner-occupied).

Indoor Qualities
The COVID-19 pandemic has uncovered the importance of indoor qualities for physical and mental health [22,23], but capturing such indoor qualities at scale is a difficult task. For this reason, we selected a variety of variables of the EPC's secondary information which could be used as a proxy for the indoor qualities. The variables included data on energy consumption, energy consumption potential, lighting, heating and hot water cost, size of glazed areas, floor area and height, number of heated rooms, number of habitable rooms, and the number of extensions.

Food Environments
Numerous studies have reported associations between fast-food exposure and health outcomes such as obesity in adults and children; however, the evidence is mixed [37]. Such mixed evidence might be caused by differences and imprecisions in measuring the exposure, which often is based on crude area counts. To address such shortcomings, we measured the exposure to fast-food outlets at a high-spatial precision by calculating the accessibility as the walking distance from an address to fast-food outlets at varying distances, and then we calculated the proportion of fast-food to all food offerings around the home. For this, we adapted a method by [20] using 2021 secondary OS Points of Interest (POI) information to identify the fast-food outlets through the existing classification within the POI data in combination with text-based keywords applied to an outlet's name (see Appendix B in [20] for the used keyword set).
We counted the number of all the food offerings, the fast-food offerings, the ratio between the fast-food outlets and all food offering outlets within varying radii, and calculated the distance decay weighted counts.

Results: Built Environment Indicators
The result of the aforementioned methodology is a set of more than 500 environmental indicators outlined in Table 1. Figure 4 shows a visualisation of a variable from the food environment domain, i.e., the fast-food exposure using a distance decay weighted count. The mapping provides geographic insights into the spatial distribution of fast-food outlets and the difference in exposure for each resident in Bradford. Urbanised areas feature a disproportionate number of fast-food outlets with the highest level of exposure in neighbourhoods around the centre. Suburban and rural neighbourhoods, on the other hand, feature comparably less exposure.
In addition to the insights into the geographic distribution of fast-food outlets, we selected 11 exemplary EIs (see Table 2) to illustrate the environmental exposure of an average person in Bradford. Figures 5 and 6 show the mappings of these, demonstrating the potential insights that these data can provide both at city and local levels.
We combined this selection of exemplary indicators with demographic information from the 2021 Census to produce a synthetic person: Samina. Samina was a 12-year-old British Pakistani girl. She lived with her two parents in a 5.70 m high mid-terrace house of 78.00 square metres. During a short morning walk from home, she would be passed by 182 cars and exposed to 9.01 micrograms per cubic metre of particulate matter 2.5, which is comparable to the annual mean concentration in the UK at urban background monitoring sites in 2018 and just below the WHO guideline [73] of an annual mean of 10 micrograms per cubic metre. The streets that she would traverse would not be very walkable (0.84), feature little diversity of shops (0.06), and have a greenery level of 0.22, which is a relatively low green environment when compared to the highest values of places in Bradford. Samina would be close to public transport (182.99 m), but not very close to many other streets in the city (a 1419.43 street centrality), and closer to a fast-food outlet (561.89 m) than to a public park or garden (889.26 m).  Figure 4 shows a visualisation of a variable from the food environment domain, i.e., the fast-food exposure using a distance decay weighted count. The mapping provides geographic insights into the spatial distribution of fast-food outlets and the difference in exposure for each resident in Bradford. Urbanised areas feature a disproportionate number of fast-food outlets with the highest level of exposure in neighbourhoods around the centre. Suburban and rural neighbourhoods, on the other hand, feature comparably less exposure.     Table 2 for details) for the entire metropolitan district of Bradford and a detailed zoom-in on the Toller ward. EIs include: (a,b) average exposure to PM 2.5; (c,d) number of vehicles during the morning period; (e,f) average level of greenery; (g,h) distance to the closest public park or garden; (i,j) distance to the closest bus stop; and (k,l) distance to the closest fast-food outlet. © OS and Crown copyright 2022.    Table 2 for details) for the entire metropolitan district of Bradford and a detailed zoom-in on the Toller ward. EIs include: (a,b) diversity of shops; (c,d) average level of walkability; (e,f) closeness to other things in the city; (g,h) height of the home; and (i,j) size of the home. © OS and Crown copyright 2022.
We combined this selection of exemplary indicators with demographic information from the 2021 Census to produce a synthetic person: Samina. Samina was a 12-year-old British Pakistani girl. She lived with her two parents in a 5.70 m high mid-terrace house of 78.00 square metres. During a short morning walk from home, she would be passed by 182 cars and exposed to 9.01 micrograms per cubic metre of particulate matter 2.5, which  Table 2 for details) for the entire metropolitan district of Bradford and a detailed zoom-in on the Toller ward. EIs include: (a,b) diversity of shops; (c,d) average level of walkability; (e,f) closeness to other things in the city; (g,h) height of the home; and (i,j) size of the home. © OS and Crown copyright 2022.
While this is only a single example, it demonstrates how powerful such a dataset can be when multiplied across a cohort-or indeed a city's population. Not only does it enable the researcher to detail each individual's susceptibility to ecological influences on their health (and their confounders), but it also allows a policy-maker to identify-through geographical mapping of the data-where interventions, such as reducing the sources of pollution, or improving access to green spaces, should be targeted most urgently.

Discussion
Constructing large-scale, longitudinal, individual-level environment exposure variables poses a series of challenges, which we divide into a) general data challenges and b) measurement challenges. General data challenges include difficulties around the availability and comparability of historic data. Availability and comparability challenges can range from differences in quality, spatial resolution, precision, completion, and classification, while measurement challenges consist of fundamental questions touching upon definition, classification and operationalisation issues.
To address these, we selected datasets for which comparable alternative datasets are available globally and which are consistent throughout the years (see Table 3). Using data from the earliest available dataset in the years where otherwise no data is available constitutes a feasible alternative for datasets that show little temporal variation, such as the entrance points of parks (i.e., OS Open Greenspace). To provide an illustration of the comparability challenges, the OS AddressBase Premium data has been available since 2004; however, a consistent address classification has existed only since 2013. This means that while spatial information exists prior to 2013, the identification of residential or nondomestic functions will need to be inferred through the use of alternative classifications such as the Valuation Office Agency's Primary Description and Special Category (Scat) Codes. This introduces comparability issues as the inferred classifications are not identical to the existing land-use classes.
OS Urban Paths m) than to a public park or garden (889.26 m). While this is only a single example, it demonstrates how powerful such a dataset can be when multiplied across a cohort-or indeed a city's population. Not only does it enable the researcher to detail each individual's susceptibility to ecological influences on their health (and their confounders), but it also allows a policy-maker to identify-through geographical mapping of the data-where interventions, such as reducing the sources of pollution, or improving access to green spaces, should be targeted most urgently.

Discussion
Constructing large-scale, longitudinal, individual-level environment exposure variables poses a series of challenges, which we divide into a) general data challenges and b) measurement challenges. General data challenges include difficulties around the availability and comparability of historic data. Availability and comparability challenges can range from differences in quality, spatial resolution, precision, completion, and classification, while measurement challenges consist of fundamental questions touching upon definition, classification and operationalisation issues.
To address these, we selected datasets for which comparable alternative datasets are available globally and which are consistent throughout the years (see Table 3). Using data from the earliest available dataset in the years where otherwise no data is available constitutes a feasible alternative for datasets that show little temporal variation, such as the entrance points of parks (i.e., OS Open Greenspace). To provide an illustration of the comparability challenges, the OS AddressBase Premium data has been available since 2004; however, a consistent address classification has existed only since 2013. This means that while spatial information exists prior to 2013, the identification of residential or non-domestic functions will need to be inferred through the use of alternative classifications such as the Valuation Office Agency's Primary Description and Special Category (Scat) Codes. This introduces comparability issues as the inferred classifications are not identical to the existing land-use classes.  Furthermore, we trialled variables from the food environment domain, i.e., exposure to fast-food in conjunction with data from the BiB longitudinal cohort survey. Specifically, we looked at the association between fast-food exposure and childhood obesity. For this, we m) than to a public park or garden (889.26 m).
While this is only a single example, it demonstrates how powerful such a dataset can be when multiplied across a cohort-or indeed a city's population. Not only does it enable the researcher to detail each individual's susceptibility to ecological influences on their health (and their confounders), but it also allows a policy-maker to identify-through geographical mapping of the data-where interventions, such as reducing the sources of pollution, or improving access to green spaces, should be targeted most urgently.

Discussion
Constructing large-scale, longitudinal, individual-level environment exposure variables poses a series of challenges, which we divide into a) general data challenges and b) measurement challenges. General data challenges include difficulties around the availability and comparability of historic data. Availability and comparability challenges can range from differences in quality, spatial resolution, precision, completion, and classification, while measurement challenges consist of fundamental questions touching upon definition, classification and operationalisation issues.
To address these, we selected datasets for which comparable alternative datasets are available globally and which are consistent throughout the years (see Table 3). Using data from the earliest available dataset in the years where otherwise no data is available constitutes a feasible alternative for datasets that show little temporal variation, such as the entrance points of parks (i.e., OS Open Greenspace). To provide an illustration of the comparability challenges, the OS AddressBase Premium data has been available since 2004; however, a consistent address classification has existed only since 2013. This means that while spatial information exists prior to 2013, the identification of residential or non-domestic functions will need to be inferred through the use of alternative classifications such as the Valuation Office Agency's Primary Description and Special Category (Scat) Codes. This introduces comparability issues as the inferred classifications are not identical to the existing land-use classes. Furthermore, we trialled variables from the food environment domain, i.e., exposure to fast-food in conjunction with data from the BiB longitudinal cohort survey. Specifically, we looked at the association between fast-food exposure and childhood obesity. For this, we

OS Open Greenspace
(0.84), feature little diversity of shops (0.06), and have a greenery level of 0.22, which is a relatively low green environment when compared to the highest values of places in Bradford. Samina would be close to public transport (182.99 m), but not very close to many other streets in the city (a 1419.43 street centrality), and closer to a fast-food outlet (561.89 m) than to a public park or garden (889.26 m). While this is only a single example, it demonstrates how powerful such a dataset can be when multiplied across a cohort-or indeed a city's population. Not only does it enable the researcher to detail each individual's susceptibility to ecological influences on their health (and their confounders), but it also allows a policy-maker to identify-through geographical mapping of the data-where interventions, such as reducing the sources of pollution, or improving access to green spaces, should be targeted most urgently.

Discussion
Constructing large-scale, longitudinal, individual-level environment exposure variables poses a series of challenges, which we divide into a) general data challenges and b) measurement challenges. General data challenges include difficulties around the availability and comparability of historic data. Availability and comparability challenges can range from differences in quality, spatial resolution, precision, completion, and classification, while measurement challenges consist of fundamental questions touching upon definition, classification and operationalisation issues.
To address these, we selected datasets for which comparable alternative datasets are available globally and which are consistent throughout the years (see Table 3). Using data from the earliest available dataset in the years where otherwise no data is available constitutes a feasible alternative for datasets that show little temporal variation, such as the entrance points of parks (i.e., OS Open Greenspace). To provide an illustration of the comparability challenges, the OS AddressBase Premium data has been available since 2004; however, a consistent address classification has existed only since 2013. This means that while spatial information exists prior to 2013, the identification of residential or non-domestic functions will need to be inferred through the use of alternative classifications such as the Valuation Office Agency's Primary Description and Special Category (Scat) Codes. This introduces comparability issues as the inferred classifications are not identical to the existing land-use classes. Furthermore, we trialled variables from the food environment domain, i.e., exposure to fast-food in conjunction with data from the BiB longitudinal cohort survey. Specifically, we looked at the association between fast-food exposure and childhood obesity. For this, we (0.84), feature little diversity of shops (0.06), and have a greenery level of 0.22, which is a relatively low green environment when compared to the highest values of places in Bradford. Samina would be close to public transport (182.99 m), but not very close to many other streets in the city (a 1419.43 street centrality), and closer to a fast-food outlet (561.89 m) than to a public park or garden (889.26 m).
While this is only a single example, it demonstrates how powerful such a dataset can be when multiplied across a cohort-or indeed a city's population. Not only does it enable the researcher to detail each individual's susceptibility to ecological influences on their health (and their confounders), but it also allows a policy-maker to identify-through geographical mapping of the data-where interventions, such as reducing the sources of pollution, or improving access to green spaces, should be targeted most urgently.

Discussion
Constructing large-scale, longitudinal, individual-level environment exposure variables poses a series of challenges, which we divide into a) general data challenges and b) measurement challenges. General data challenges include difficulties around the availability and comparability of historic data. Availability and comparability challenges can range from differences in quality, spatial resolution, precision, completion, and classification, while measurement challenges consist of fundamental questions touching upon definition, classification and operationalisation issues.
To address these, we selected datasets for which comparable alternative datasets are available globally and which are consistent throughout the years (see Table 3). Using data from the earliest available dataset in the years where otherwise no data is available constitutes a feasible alternative for datasets that show little temporal variation, such as the entrance points of parks (i.e., OS Open Greenspace). To provide an illustration of the comparability challenges, the OS AddressBase Premium data has been available since 2004; however, a consistent address classification has existed only since 2013. This means that while spatial information exists prior to 2013, the identification of residential or non-domestic functions will need to be inferred through the use of alternative classifications such as the Valuation Office Agency's Primary Description and Special Category (Scat) Codes. This introduces comparability issues as the inferred classifications are not identical to the existing land-use classes. Furthermore, we trialled variables from the food environment domain, i.e., exposure to fast-food in conjunction with data from the BiB longitudinal cohort survey. Specifically, we looked at the association between fast-food exposure and childhood obesity. For this, we (0.84), feature little diversity of shops (0.06), and have a greenery level of 0.22, which relatively low green environment when compared to the highest values of places in B ford. Samina would be close to public transport (182.99 m), but not very close to m other streets in the city (a 1419.43 street centrality), and closer to a fast-food outlet (56 m) than to a public park or garden (889.26 m).
While this is only a single example, it demonstrates how powerful such a dataset be when multiplied across a cohort-or indeed a city's population. Not only does it en the researcher to detail each individual's susceptibility to ecological influences on t health (and their confounders), but it also allows a policy-maker to identify-through ographical mapping of the data-where interventions, such as reducing the source pollution, or improving access to green spaces, should be targeted most urgently.

Discussion
Constructing large-scale, longitudinal, individual-level environment exposure v ables poses a series of challenges, which we divide into a) general data challenges an measurement challenges. General data challenges include difficulties around the av bility and comparability of historic data. Availability and comparability challenges range from differences in quality, spatial resolution, precision, completion, and classi tion, while measurement challenges consist of fundamental questions touching upon inition, classification and operationalisation issues.
To address these, we selected datasets for which comparable alternative datasets available globally and which are consistent throughout the years (see Table 3). Using from the earliest available dataset in the years where otherwise no data is available co tutes a feasible alternative for datasets that show little temporal variation, such as the trance points of parks (i.e., OS Open Greenspace). To provide an illustration of the com rability challenges, the OS AddressBase Premium data has been available since 2004; h ever, a consistent address classification has existed only since 2013. This means that w spatial information exists prior to 2013, the identification of residential or non-dom functions will need to be inferred through the use of alternative classifications such as Valuation Office Agency's Primary Description and Special Category (Scat) Codes. Thi troduces comparability issues as the inferred classifications are not identical to the exis land-use classes. Furthermore, we trialled variables from the food environment domain, i.e., expo to fast-food in conjunction with data from the BiB longitudinal cohort survey. Specific we looked at the association between fast-food exposure and childhood obesity. For this (0.84), feature little diversity of shops (0.06), and have a greenery level of 0.22, relatively low green environment when compared to the highest values of plac ford. Samina would be close to public transport (182.99 m), but not very clos other streets in the city (a 1419.43 street centrality), and closer to a fast-food ou m) than to a public park or garden (889.26 m).
While this is only a single example, it demonstrates how powerful such a d be when multiplied across a cohort-or indeed a city's population. Not only doe the researcher to detail each individual's susceptibility to ecological influenc health (and their confounders), but it also allows a policy-maker to identify-th ographical mapping of the data-where interventions, such as reducing the pollution, or improving access to green spaces, should be targeted most urgent

Discussion
Constructing large-scale, longitudinal, individual-level environment expo ables poses a series of challenges, which we divide into a) general data challen measurement challenges. General data challenges include difficulties around bility and comparability of historic data. Availability and comparability chal range from differences in quality, spatial resolution, precision, completion, and tion, while measurement challenges consist of fundamental questions touching inition, classification and operationalisation issues.
To address these, we selected datasets for which comparable alternative d available globally and which are consistent throughout the years (see Table 3). from the earliest available dataset in the years where otherwise no data is availa tutes a feasible alternative for datasets that show little temporal variation, such trance points of parks (i.e., OS Open Greenspace). To provide an illustration of t rability challenges, the OS AddressBase Premium data has been available since ever, a consistent address classification has existed only since 2013. This means spatial information exists prior to 2013, the identification of residential or non functions will need to be inferred through the use of alternative classifications s Valuation Office Agency's Primary Description and Special Category (Scat) Cod troduces comparability issues as the inferred classifications are not identical to t land-use classes. Furthermore, we trialled variables from the food environment domain, i.e to fast-food in conjunction with data from the BiB longitudinal cohort survey. S we looked at the association between fast-food exposure and childhood obesity. F is comparable to the annual mean concentration in the UK at urban b ing sites in 2018 and just below the WHO guideline [73] of an ann crograms per cubic metre. The streets that she would traverse would n (0.84), feature little diversity of shops (0.06), and have a greenery lev relatively low green environment when compared to the highest valu ford. Samina would be close to public transport (182.99 m), but not other streets in the city (a 1419.43 street centrality), and closer to a fas m) than to a public park or garden (889.26 m). While this is only a single example, it demonstrates how powerfu be when multiplied across a cohort-or indeed a city's population. No the researcher to detail each individual's susceptibility to ecological health (and their confounders), but it also allows a policy-maker to id ographical mapping of the data-where interventions, such as redu pollution, or improving access to green spaces, should be targeted mo

Discussion
Constructing large-scale, longitudinal, individual-level environm ables poses a series of challenges, which we divide into a) general da measurement challenges. General data challenges include difficultie bility and comparability of historic data. Availability and comparab range from differences in quality, spatial resolution, precision, compl tion, while measurement challenges consist of fundamental questions inition, classification and operationalisation issues.
To address these, we selected datasets for which comparable alte available globally and which are consistent throughout the years (see from the earliest available dataset in the years where otherwise no dat tutes a feasible alternative for datasets that show little temporal varia trance points of parks (i.e., OS Open Greenspace). To provide an illust rability challenges, the OS AddressBase Premium data has been availa ever, a consistent address classification has existed only since 2013. Th spatial information exists prior to 2013, the identification of resident functions will need to be inferred through the use of alternative classi Valuation Office Agency's Primary Description and Special Category ( troduces comparability issues as the inferred classifications are not ide land-use classes. Furthermore, we trialled variables from the food environment do to fast-food in conjunction with data from the BiB longitudinal cohort we looked at the association between fast-food exposure and childhood ': dataset is not available. Furthermore, we trialled variables from the food environment domain, i.e., exposure to fast-food in conjunction with data from the BiB longitudinal cohort survey. Specifically, we looked at the association between fast-food exposure and childhood obesity. For this, we first applied our outlined method and calculated the exposure variables for all the residential addresses and all the school addresses in Bradford, as well as journeys between the two. We then geo-referenced the BiB participants' address information as well as the school address information and linked these to their respective UPRNs. These UPRNs were then used to link the environmental exposure information to the BiB participants in a safe data environment. Our analysis of the exposure around the home showed that increasing spatial precision in the quantification of the exposure to FFOs does not lead to differences in the associations with childhood obesity, which challenges the previous findings reporting associations between these two [74]. This analysis highlighted a series of measurement challenges. For example, there is little agreement as to what constitutes a fast-food outlet and differences in the definitions likely explain the heterogeneity in the reported associations [37]. In addition, a variety of potential spatial and non-spatial confounders which were not captured by our exposure measurements might be at play and would have to be controlled for when utilising environmental exposure variables. In the context of fast-food exposure, these could include, among others, a genetic predisposition, behavioural differences, financial situations, or level of deprivation. Each of these might also be expressed as a spatio-temporal urban self-selection process. For example, while obesity has been linked to economic deprivation, the density of fast-food outlets within reach of a home address seems not to be the main driving factor [75], which highlights the importance of controlling for such confounders when utilising spatial information, and it indeed points to the importance of measuring the proximity by street distance (rather than an average across an area, or 'as the crow flies').
The strength of our approach is-besides capturing the exposure at a spatial resolution at which it is perceived-its generalisability, which is applicable across the globe and to any type of environmental exposure through time. For example, future studies that wish to apply the proposed methodology in international contexts may consider the inclusion of additional variables relevant to the respective local environmental conditions, e.g., the surface temperature information derived from satellite imagery for areas affected by more extreme climates. The method enables not only investigations into the associations of exposure types and health outcomes but it provides the opportunity for causal inference research designs. An example of such a research design currently being undertaken by our researchers is to utilise involuntary house move information to investigate the causal effect of air pollution on respiratory diseases using the Connected Bradford database. Opportunities for causal inference research designs not only exist within the presented Connected Bradford database but for any longitudinal cohort survey. We have identified a series of longitudinal cohort surveys in the UK, i.e., the National Child Development Study (NCDS), 1970 British Cohort Study (BCS70), UK Household Longitudinal Study (UKHLS), British Household Panel Survey (BHPS), Millennium Cohort Study (MCS) and Next Step (previously known as the Longitudinal Study of Young People in England (LSYPE)) (see Table 4), which can adopt our approach by linking a participant's address to their UPRNs (subject to the necessary approvals) and subsequently to their environmental exposure without compromising their identity. Such nationwide longitudinal cohort data provides an untapped potential in identifying the causal links between the built environment and health outcomes. is comparable to the annual mean concentration in the UK at urban background monitoring sites in 2018 and just below the WHO guideline [73] of an annual mean of 10 micrograms per cubic metre. The streets that she would traverse would not be very walkable (0.84), feature little diversity of shops (0.06), and have a greenery level of 0.22, which is a relatively low green environment when compared to the highest values of places in Bradford. Samina would be close to public transport (182.99 m), but not very close to many other streets in the city (a 1419.43 street centrality), and closer to a fast-food outlet (561.89 m) than to a public park or garden (889.26 m). While this is only a single example, it demonstrates how powerful such a dataset can be when multiplied across a cohort-or indeed a city's population. Not only does it enable the researcher to detail each individual's susceptibility to ecological influences on their health (and their confounders), but it also allows a policy-maker to identify-through geographical mapping of the data-where interventions, such as reducing the sources of pollution, or improving access to green spaces, should be targeted most urgently.

Discussion
Constructing large-scale, longitudinal, individual-level environment exposure variables poses a series of challenges, which we divide into a) general data challenges and b) measurement challenges. General data challenges include difficulties around the availability and comparability of historic data. Availability and comparability challenges can range from differences in quality, spatial resolution, precision, completion, and classification, while measurement challenges consist of fundamental questions touching upon definition, classification and operationalisation issues.
To address these, we selected datasets for which comparable alternative datasets are available globally and which are consistent throughout the years (see Table 3). Using data from the earliest available dataset in the years where otherwise no data is available constitutes a feasible alternative for datasets that show little temporal variation, such as the entrance points of parks (i.e., OS Open Greenspace). To provide an illustration of the comparability challenges, the OS AddressBase Premium data has been available since 2004; however, a consistent address classification has existed only since 2013. This means that while spatial information exists prior to 2013, the identification of residential or non-domestic functions will need to be inferred through the use of alternative classifications such as the Valuation Office Agency's Primary Description and Special Category (Scat) Codes. This introduces comparability issues as the inferred classifications are not identical to the existing land-use classes. Furthermore, we trialled variables from the food environment domain, i.e., exposure to fast-food in conjunction with data from the BiB longitudinal cohort survey. Specifically, we looked at the association between fast-food exposure and childhood obesity. For this, we is comparable to the annual mean concentration in the UK at urban background mon ing sites in 2018 and just below the WHO guideline [73] of an annual mean of 10 crograms per cubic metre. The streets that she would traverse would not be very walk (0.84), feature little diversity of shops (0.06), and have a greenery level of 0.22, which relatively low green environment when compared to the highest values of places in B ford. Samina would be close to public transport (182.99 m), but not very close to m other streets in the city (a 1419.43 street centrality), and closer to a fast-food outlet (56 m) than to a public park or garden (889.26 m).
While this is only a single example, it demonstrates how powerful such a dataset be when multiplied across a cohort-or indeed a city's population. Not only does it en the researcher to detail each individual's susceptibility to ecological influences on t health (and their confounders), but it also allows a policy-maker to identify-through ographical mapping of the data-where interventions, such as reducing the source pollution, or improving access to green spaces, should be targeted most urgently.

Discussion
Constructing large-scale, longitudinal, individual-level environment exposure v ables poses a series of challenges, which we divide into a) general data challenges an measurement challenges. General data challenges include difficulties around the av bility and comparability of historic data. Availability and comparability challenges range from differences in quality, spatial resolution, precision, completion, and classi tion, while measurement challenges consist of fundamental questions touching upon inition, classification and operationalisation issues.
To address these, we selected datasets for which comparable alternative datasets available globally and which are consistent throughout the years (see Table 3). Using from the earliest available dataset in the years where otherwise no data is available co tutes a feasible alternative for datasets that show little temporal variation, such as the trance points of parks (i.e., OS Open Greenspace). To provide an illustration of the com rability challenges, the OS AddressBase Premium data has been available since 2004; h ever, a consistent address classification has existed only since 2013. This means that w spatial information exists prior to 2013, the identification of residential or non-dom functions will need to be inferred through the use of alternative classifications such as Valuation Office Agency's Primary Description and Special Category (Scat) Codes. Thi troduces comparability issues as the inferred classifications are not identical to the exis land-use classes. Furthermore, we trialled variables from the food environment domain, i.e., expo to fast-food in conjunction with data from the BiB longitudinal cohort survey. Specific we looked at the association between fast-food exposure and childhood obesity. For this is comparable to the annual mean concentration in the UK at urban backgroun ing sites in 2018 and just below the WHO guideline [73] of an annual mean crograms per cubic metre. The streets that she would traverse would not be ver (0.84), feature little diversity of shops (0.06), and have a greenery level of 0.22, relatively low green environment when compared to the highest values of plac ford. Samina would be close to public transport (182.99 m), but not very clos other streets in the city (a 1419.43 street centrality), and closer to a fast-food ou m) than to a public park or garden (889.26 m).
While this is only a single example, it demonstrates how powerful such a d be when multiplied across a cohort-or indeed a city's population. Not only doe the researcher to detail each individual's susceptibility to ecological influenc health (and their confounders), but it also allows a policy-maker to identify-th ographical mapping of the data-where interventions, such as reducing the pollution, or improving access to green spaces, should be targeted most urgent

Discussion
Constructing large-scale, longitudinal, individual-level environment expo ables poses a series of challenges, which we divide into a) general data challen measurement challenges. General data challenges include difficulties around bility and comparability of historic data. Availability and comparability chal range from differences in quality, spatial resolution, precision, completion, and tion, while measurement challenges consist of fundamental questions touching inition, classification and operationalisation issues.
To address these, we selected datasets for which comparable alternative d available globally and which are consistent throughout the years (see Table 3). from the earliest available dataset in the years where otherwise no data is availa tutes a feasible alternative for datasets that show little temporal variation, such trance points of parks (i.e., OS Open Greenspace). To provide an illustration of t rability challenges, the OS AddressBase Premium data has been available since ever, a consistent address classification has existed only since 2013. This means spatial information exists prior to 2013, the identification of residential or non functions will need to be inferred through the use of alternative classifications s Valuation Office Agency's Primary Description and Special Category (Scat) Cod troduces comparability issues as the inferred classifications are not identical to t land-use classes. Furthermore, we trialled variables from the food environment domain, i.e to fast-food in conjunction with data from the BiB longitudinal cohort survey. S we looked at the association between fast-food exposure and childhood obesity. F is comparable to the annual mean concentration in the UK at urban back ing sites in 2018 and just below the WHO guideline [73] of an annua crograms per cubic metre. The streets that she would traverse would not (0.84), feature little diversity of shops (0.06), and have a greenery level o relatively low green environment when compared to the highest values ford. Samina would be close to public transport (182.99 m), but not ve other streets in the city (a 1419.43 street centrality), and closer to a fast-fo m) than to a public park or garden (889.26 m).
While this is only a single example, it demonstrates how powerful s be when multiplied across a cohort-or indeed a city's population. Not o the researcher to detail each individual's susceptibility to ecological in health (and their confounders), but it also allows a policy-maker to ident ographical mapping of the data-where interventions, such as reducin pollution, or improving access to green spaces, should be targeted most

Discussion
Constructing large-scale, longitudinal, individual-level environme ables poses a series of challenges, which we divide into a) general data c measurement challenges. General data challenges include difficulties ar bility and comparability of historic data. Availability and comparabilit range from differences in quality, spatial resolution, precision, completio tion, while measurement challenges consist of fundamental questions to inition, classification and operationalisation issues.
To address these, we selected datasets for which comparable altern available globally and which are consistent throughout the years (see Tab from the earliest available dataset in the years where otherwise no data is tutes a feasible alternative for datasets that show little temporal variation trance points of parks (i.e., OS Open Greenspace). To provide an illustrati rability challenges, the OS AddressBase Premium data has been available ever, a consistent address classification has existed only since 2013. This spatial information exists prior to 2013, the identification of residential functions will need to be inferred through the use of alternative classifica Valuation Office Agency's Primary Description and Special Category (Sca troduces comparability issues as the inferred classifications are not identi land-use classes. Furthermore, we trialled variables from the food environment dom to fast-food in conjunction with data from the BiB longitudinal cohort su we looked at the association between fast-food exposure and childhood ob is comparable to the annual mean concentration in the UK at urba ing sites in 2018 and just below the WHO guideline [73] of an crograms per cubic metre. The streets that she would traverse wou (0.84), feature little diversity of shops (0.06), and have a greenery relatively low green environment when compared to the highest v ford. Samina would be close to public transport (182.99 m), but other streets in the city (a 1419.43 street centrality), and closer to a m) than to a public park or garden (889.26 m).
While this is only a single example, it demonstrates how pow be when multiplied across a cohort-or indeed a city's population. the researcher to detail each individual's susceptibility to ecolog health (and their confounders), but it also allows a policy-maker t ographical mapping of the data-where interventions, such as r pollution, or improving access to green spaces, should be targeted

Discussion
Constructing large-scale, longitudinal, individual-level envir ables poses a series of challenges, which we divide into a) genera measurement challenges. General data challenges include difficu bility and comparability of historic data. Availability and compa range from differences in quality, spatial resolution, precision, com tion, while measurement challenges consist of fundamental questi inition, classification and operationalisation issues.
To address these, we selected datasets for which comparable available globally and which are consistent throughout the years ( from the earliest available dataset in the years where otherwise no tutes a feasible alternative for datasets that show little temporal v trance points of parks (i.e., OS Open Greenspace). To provide an il rability challenges, the OS AddressBase Premium data has been av ever, a consistent address classification has existed only since 2013 spatial information exists prior to 2013, the identification of resid functions will need to be inferred through the use of alternative cl Valuation Office Agency's Primary Description and Special Catego troduces comparability issues as the inferred classifications are not land-use classes. Furthermore, we trialled variables from the food environmen to fast-food in conjunction with data from the BiB longitudinal coh we looked at the association between fast-food exposure and childh is comparable to the annual mean concentration in the UK a ing sites in 2018 and just below the WHO guideline [73] crograms per cubic metre. The streets that she would traver (0.84), feature little diversity of shops (0.06), and have a gre relatively low green environment when compared to the hi ford. Samina would be close to public transport (182.99 m other streets in the city (a 1419.43 street centrality), and clos m) than to a public park or garden (889.26 m).
While this is only a single example, it demonstrates ho be when multiplied across a cohort-or indeed a city's popu the researcher to detail each individual's susceptibility to health (and their confounders), but it also allows a policy-m ographical mapping of the data-where interventions, su pollution, or improving access to green spaces, should be ta

Discussion
Constructing large-scale, longitudinal, individual-leve ables poses a series of challenges, which we divide into a) g measurement challenges. General data challenges include bility and comparability of historic data. Availability and range from differences in quality, spatial resolution, precisi tion, while measurement challenges consist of fundamental inition, classification and operationalisation issues.
To address these, we selected datasets for which comp available globally and which are consistent throughout the from the earliest available dataset in the years where otherw tutes a feasible alternative for datasets that show little temp trance points of parks (i.e., OS Open Greenspace). To provid rability challenges, the OS AddressBase Premium data has b ever, a consistent address classification has existed only sinc spatial information exists prior to 2013, the identification o functions will need to be inferred through the use of alterna Valuation Office Agency's Primary Description and Special C troduces comparability issues as the inferred classifications a land-use classes. Furthermore, we trialled variables from the food envir to fast-food in conjunction with data from the BiB longitudi we looked at the association between fast-food exposure and √ is comparable to the annual mean concentratio ing sites in 2018 and just below the WHO gu crograms per cubic metre. The streets that she w (0.84), feature little diversity of shops (0.06), an relatively low green environment when compa ford. Samina would be close to public transp other streets in the city (a 1419.43 street centra m) than to a public park or garden (889.26 m).
While this is only a single example, it dem be when multiplied across a cohort-or indeed the researcher to detail each individual's susc health (and their confounders), but it also allow ographical mapping of the data-where inter pollution, or improving access to green spaces

Discussion
Constructing large-scale, longitudinal, in ables poses a series of challenges, which we di measurement challenges. General data challen bility and comparability of historic data. Ava range from differences in quality, spatial resol tion, while measurement challenges consist of inition, classification and operationalisation is To address these, we selected datasets for available globally and which are consistent thr from the earliest available dataset in the years w tutes a feasible alternative for datasets that sho trance points of parks (i.e., OS Open Greenspac rability challenges, the OS AddressBase Premiu ever, a consistent address classification has exi spatial information exists prior to 2013, the id functions will need to be inferred through the Valuation Office Agency's Primary Description troduces comparability issues as the inferred cl land-use classes. Furthermore, we trialled variables from th to fast-food in conjunction with data from the B we looked at the association between fast-food e is comparable to the annual mean conce ing sites in 2018 and just below the W crograms per cubic metre. The streets th (0.84), feature little diversity of shops (0 relatively low green environment when ford. Samina would be close to public other streets in the city (a 1419.43 street m) than to a public park or garden (889. While this is only a single example, be when multiplied across a cohort-or the researcher to detail each individual health (and their confounders), but it als ographical mapping of the data-wher pollution, or improving access to green

Discussion
Constructing large-scale, longitudi ables poses a series of challenges, which measurement challenges. General data bility and comparability of historic dat range from differences in quality, spatia tion, while measurement challenges con inition, classification and operationalisa To address these, we selected datas available globally and which are consist from the earliest available dataset in the tutes a feasible alternative for datasets t trance points of parks (i.e., OS Open Gre rability challenges, the OS AddressBase ever, a consistent address classification h spatial information exists prior to 2013, functions will need to be inferred throug Valuation Office Agency's Primary Desc troduces comparability issues as the infe land-use classes. is comparable to the annual mean concentration in the UK at urban background monitoring sites in 2018 and just below the WHO guideline [73] of an annual mean of 10 micrograms per cubic metre. The streets that she would traverse would not be very walkable (0.84), feature little diversity of shops (0.06), and have a greenery level of 0.22, which is a relatively low green environment when compared to the highest values of places in Bradford. Samina would be close to public transport (182.99 m), but not very close to many other streets in the city (a 1419.43 street centrality), and closer to a fast-food outlet (561.89 m) than to a public park or garden (889.26 m). While this is only a single example, it demonstrates how powerful such a dataset can be when multiplied across a cohort-or indeed a city's population. Not only does it enable the researcher to detail each individual's susceptibility to ecological influences on their health (and their confounders), but it also allows a policy-maker to identify-through geographical mapping of the data-where interventions, such as reducing the sources of pollution, or improving access to green spaces, should be targeted most urgently.

Discussion
Constructing large-scale, longitudinal, individual-level environment exposure variables poses a series of challenges, which we divide into a) general data challenges and b) measurement challenges. General data challenges include difficulties around the availability and comparability of historic data. Availability and comparability challenges can range from differences in quality, spatial resolution, precision, completion, and classification, while measurement challenges consist of fundamental questions touching upon definition, classification and operationalisation issues.
To address these, we selected datasets for which comparable alternative datasets are available globally and which are consistent throughout the years (see Table 3). Using data from the earliest available dataset in the years where otherwise no data is available constitutes a feasible alternative for datasets that show little temporal variation, such as the entrance points of parks (i.e., OS Open Greenspace). To provide an illustration of the comparability challenges, the OS AddressBase Premium data has been available since 2004; however, a consistent address classification has existed only since 2013. This means that while spatial information exists prior to 2013, the identification of residential or non-domestic functions will need to be inferred through the use of alternative classifications such as the Valuation Office Agency's Primary Description and Special Category (Scat) Codes. This introduces comparability issues as the inferred classifications are not identical to the existing land-use classes. Furthermore, we trialled variables from the food environment domain, i.e., exposure to fast-food in conjunction with data from the BiB longitudinal cohort survey. Specifically, we looked at the association between fast-food exposure and childhood obesity. For this, we is comparable to the annual mean concentration in the UK at urban background monitoring sites in 2018 and just below the WHO guideline [73] of an annual mean of 10 micrograms per cubic metre. The streets that she would traverse would not be very walkable (0.84), feature little diversity of shops (0.06), and have a greenery level of 0.22, which is a relatively low green environment when compared to the highest values of places in Bradford. Samina would be close to public transport (182.99 m), but not very close to many other streets in the city (a 1419.43 street centrality), and closer to a fast-food outlet (561.89 m) than to a public park or garden (889.26 m). While this is only a single example, it demonstrates how powerful such a dataset can be when multiplied across a cohort-or indeed a city's population. Not only does it enable the researcher to detail each individual's susceptibility to ecological influences on their health (and their confounders), but it also allows a policy-maker to identify-through geographical mapping of the data-where interventions, such as reducing the sources of pollution, or improving access to green spaces, should be targeted most urgently.

Discussion
Constructing large-scale, longitudinal, individual-level environment exposure variables poses a series of challenges, which we divide into a) general data challenges and b) measurement challenges. General data challenges include difficulties around the availability and comparability of historic data. Availability and comparability challenges can range from differences in quality, spatial resolution, precision, completion, and classification, while measurement challenges consist of fundamental questions touching upon definition, classification and operationalisation issues.
To address these, we selected datasets for which comparable alternative datasets are available globally and which are consistent throughout the years (see Table 3). Using data from the earliest available dataset in the years where otherwise no data is available constitutes a feasible alternative for datasets that show little temporal variation, such as the entrance points of parks (i.e., OS Open Greenspace). To provide an illustration of the comparability challenges, the OS AddressBase Premium data has been available since 2004; however, a consistent address classification has existed only since 2013. This means that while spatial information exists prior to 2013, the identification of residential or non-domestic functions will need to be inferred through the use of alternative classifications such as the Valuation Office Agency's Primary Description and Special Category (Scat) Codes. This introduces comparability issues as the inferred classifications are not identical to the existing land-use classes. Furthermore, we trialled variables from the food environment domain, i.e., exposure to fast-food in conjunction with data from the BiB longitudinal cohort survey. Specifically, we looked at the association between fast-food exposure and childhood obesity. For this, we Int. J. Environ. Res. Public Health 2023, 20, x FOR PEER REVIEW 17 is comparable to the annual mean concentration in the UK at urban background mon ing sites in 2018 and just below the WHO guideline [73] of an annual mean of 10 crograms per cubic metre. The streets that she would traverse would not be very walk (0.84), feature little diversity of shops (0.06), and have a greenery level of 0.22, which relatively low green environment when compared to the highest values of places in B ford. Samina would be close to public transport (182.99 m), but not very close to m other streets in the city (a 1419.43 street centrality), and closer to a fast-food outlet (56 m) than to a public park or garden (889.26 m). While this is only a single example, it demonstrates how powerful such a dataset be when multiplied across a cohort-or indeed a city's population. Not only does it en the researcher to detail each individual's susceptibility to ecological influences on t health (and their confounders), but it also allows a policy-maker to identify-through ographical mapping of the data-where interventions, such as reducing the source pollution, or improving access to green spaces, should be targeted most urgently.

Discussion
Constructing large-scale, longitudinal, individual-level environment exposure v ables poses a series of challenges, which we divide into a) general data challenges an measurement challenges. General data challenges include difficulties around the av bility and comparability of historic data. Availability and comparability challenges range from differences in quality, spatial resolution, precision, completion, and classi tion, while measurement challenges consist of fundamental questions touching upon inition, classification and operationalisation issues.
To address these, we selected datasets for which comparable alternative datasets available globally and which are consistent throughout the years (see Table 3). Using from the earliest available dataset in the years where otherwise no data is available co tutes a feasible alternative for datasets that show little temporal variation, such as the trance points of parks (i.e., OS Open Greenspace). To provide an illustration of the com rability challenges, the OS AddressBase Premium data has been available since 2004; h ever, a consistent address classification has existed only since 2013. This means that w spatial information exists prior to 2013, the identification of residential or non-dom functions will need to be inferred through the use of alternative classifications such as Valuation Office Agency's Primary Description and Special Category (Scat) Codes. Thi troduces comparability issues as the inferred classifications are not identical to the exis land-use classes. is comparable to the annual mean concentration in the UK at urban back ing sites in 2018 and just below the WHO guideline [73] of an annua crograms per cubic metre. The streets that she would traverse would not (0.84), feature little diversity of shops (0.06), and have a greenery level o relatively low green environment when compared to the highest values ford. Samina would be close to public transport (182.99 m), but not ve other streets in the city (a 1419.43 street centrality), and closer to a fast-fo m) than to a public park or garden (889.26 m).
While this is only a single example, it demonstrates how powerful s be when multiplied across a cohort-or indeed a city's population. Not o the researcher to detail each individual's susceptibility to ecological in health (and their confounders), but it also allows a policy-maker to ident ographical mapping of the data-where interventions, such as reducin pollution, or improving access to green spaces, should be targeted most

Discussion
Constructing large-scale, longitudinal, individual-level environme ables poses a series of challenges, which we divide into a) general data c measurement challenges. General data challenges include difficulties ar bility and comparability of historic data. Availability and comparabilit range from differences in quality, spatial resolution, precision, completio tion, while measurement challenges consist of fundamental questions to inition, classification and operationalisation issues.
To address these, we selected datasets for which comparable altern available globally and which are consistent throughout the years (see Tab from the earliest available dataset in the years where otherwise no data is tutes a feasible alternative for datasets that show little temporal variation trance points of parks (i.e., OS Open Greenspace). To provide an illustrati rability challenges, the OS AddressBase Premium data has been available ever, a consistent address classification has existed only since 2013. This spatial information exists prior to 2013, the identification of residential functions will need to be inferred through the use of alternative classifica Valuation Office Agency's Primary Description and Special Category (Sca troduces comparability issues as the inferred classifications are not identi land-use classes. Furthermore, we trialled variables from the food environment dom to fast-food in conjunction with data from the BiB longitudinal cohort su we looked at the association between fast-food exposure and childhood ob Int. J. Environ. Res. Public Health 2023, 20, x FOR PEER REVIEW is comparable to the annual mean concentration in the UK at urba ing sites in 2018 and just below the WHO guideline [73] of an crograms per cubic metre. The streets that she would traverse wou (0.84), feature little diversity of shops (0.06), and have a greenery relatively low green environment when compared to the highest v ford. Samina would be close to public transport (182.99 m), but other streets in the city (a 1419.43 street centrality), and closer to a m) than to a public park or garden (889.26 m).
While this is only a single example, it demonstrates how pow be when multiplied across a cohort-or indeed a city's population. the researcher to detail each individual's susceptibility to ecolog health (and their confounders), but it also allows a policy-maker t ographical mapping of the data-where interventions, such as r pollution, or improving access to green spaces, should be targeted

Discussion
Constructing large-scale, longitudinal, individual-level envir ables poses a series of challenges, which we divide into a) genera measurement challenges. General data challenges include difficu bility and comparability of historic data. Availability and compa range from differences in quality, spatial resolution, precision, com tion, while measurement challenges consist of fundamental questi inition, classification and operationalisation issues.
To address these, we selected datasets for which comparable available globally and which are consistent throughout the years ( from the earliest available dataset in the years where otherwise no tutes a feasible alternative for datasets that show little temporal v trance points of parks (i.e., OS Open Greenspace). To provide an il rability challenges, the OS AddressBase Premium data has been av ever, a consistent address classification has existed only since 2013 spatial information exists prior to 2013, the identification of resid functions will need to be inferred through the use of alternative cl Valuation Office Agency's Primary Description and Special Catego troduces comparability issues as the inferred classifications are not land-use classes. Furthermore, we trialled variables from the food environmen to fast-food in conjunction with data from the BiB longitudinal coh we looked at the association between fast-food exposure and childh Int. J. Environ. Res. Public Health 2023, 20, x FOR PEER REVIEW is comparable to the annual mean concentration in the UK a ing sites in 2018 and just below the WHO guideline [73] crograms per cubic metre. The streets that she would traver (0.84), feature little diversity of shops (0.06), and have a gre relatively low green environment when compared to the hi ford. Samina would be close to public transport (182.99 m other streets in the city (a 1419.43 street centrality), and clos m) than to a public park or garden (889.26 m).
While this is only a single example, it demonstrates ho be when multiplied across a cohort-or indeed a city's popu the researcher to detail each individual's susceptibility to health (and their confounders), but it also allows a policy-m ographical mapping of the data-where interventions, su pollution, or improving access to green spaces, should be ta

Discussion
Constructing large-scale, longitudinal, individual-leve ables poses a series of challenges, which we divide into a) g measurement challenges. General data challenges include bility and comparability of historic data. Availability and range from differences in quality, spatial resolution, precisi tion, while measurement challenges consist of fundamental inition, classification and operationalisation issues.
To address these, we selected datasets for which comp available globally and which are consistent throughout the from the earliest available dataset in the years where otherw tutes a feasible alternative for datasets that show little temp trance points of parks (i.e., OS Open Greenspace). To provid rability challenges, the OS AddressBase Premium data has b ever, a consistent address classification has existed only sinc spatial information exists prior to 2013, the identification o functions will need to be inferred through the use of alterna Valuation Office Agency's Primary Description and Special C troduces comparability issues as the inferred classifications a land-use classes. Furthermore, we trialled variables from the food envir to fast-food in conjunction with data from the BiB longitudi we looked at the association between fast-food exposure and Int. J. Environ. Res. Public Health 2023, 20, x FOR PEER REVIEW is comparable to the annual mean concentration in th ing sites in 2018 and just below the WHO guidelin crograms per cubic metre. The streets that she would (0.84), feature little diversity of shops (0.06), and hav relatively low green environment when compared to ford. Samina would be close to public transport (18 other streets in the city (a 1419.43 street centrality), an m) than to a public park or garden (889.26 m).
While this is only a single example, it demonstra be when multiplied across a cohort-or indeed a city' the researcher to detail each individual's susceptibi health (and their confounders), but it also allows a po ographical mapping of the data-where interventio pollution, or improving access to green spaces, shoul

Discussion
Constructing large-scale, longitudinal, individu ables poses a series of challenges, which we divide in measurement challenges. General data challenges in bility and comparability of historic data. Availabilit range from differences in quality, spatial resolution, p tion, while measurement challenges consist of fundam inition, classification and operationalisation issues.
To address these, we selected datasets for which available globally and which are consistent througho from the earliest available dataset in the years where tutes a feasible alternative for datasets that show littl trance points of parks (i.e., OS Open Greenspace). To rability challenges, the OS AddressBase Premium dat ever, a consistent address classification has existed on spatial information exists prior to 2013, the identific functions will need to be inferred through the use of Valuation Office Agency's Primary Description and S troduces comparability issues as the inferred classifica land-use classes. Furthermore, we trialled variables from the food to fast-food in conjunction with data from the BiB lon we looked at the association between fast-food exposu √ Int. J. Environ. Res. Public Health 2023, 20, x FOR PEER REVIEW is comparable to the annual mean conce ing sites in 2018 and just below the W crograms per cubic metre. The streets th (0.84), feature little diversity of shops (0 relatively low green environment when ford. Samina would be close to public other streets in the city (a 1419.43 street m) than to a public park or garden (889. While this is only a single example, be when multiplied across a cohort-or the researcher to detail each individual health (and their confounders), but it als ographical mapping of the data-wher pollution, or improving access to green

Discussion
Constructing large-scale, longitudi ables poses a series of challenges, which measurement challenges. General data bility and comparability of historic dat range from differences in quality, spatia tion, while measurement challenges con inition, classification and operationalisa To address these, we selected datas available globally and which are consist from the earliest available dataset in the tutes a feasible alternative for datasets t trance points of parks (i.e., OS Open Gre rability challenges, the OS AddressBase ever, a consistent address classification h spatial information exists prior to 2013, functions will need to be inferred throug Valuation Office Agency's Primary Desc troduces comparability issues as the infe land-use classes. Furthermore, we trialled variables to fast-food in conjunction with data fro we looked at the association between fast is comparable to the annual mean concentration in the UK at urban background monitoring sites in 2018 and just below the WHO guideline [73] of an annual mean of 10 micrograms per cubic metre. The streets that she would traverse would not be very walkable (0.84), feature little diversity of shops (0.06), and have a greenery level of 0.22, which is a relatively low green environment when compared to the highest values of places in Bradford. Samina would be close to public transport (182.99 m), but not very close to many other streets in the city (a 1419.43 street centrality), and closer to a fast-food outlet (561.89 m) than to a public park or garden (889.26 m). While this is only a single example, it demonstrates how powerful such a dataset can be when multiplied across a cohort-or indeed a city's population. Not only does it enable the researcher to detail each individual's susceptibility to ecological influences on their health (and their confounders), but it also allows a policy-maker to identify-through geographical mapping of the data-where interventions, such as reducing the sources of pollution, or improving access to green spaces, should be targeted most urgently.

Discussion
Constructing large-scale, longitudinal, individual-level environment exposure variables poses a series of challenges, which we divide into a) general data challenges and b) measurement challenges. General data challenges include difficulties around the availability and comparability of historic data. Availability and comparability challenges can range from differences in quality, spatial resolution, precision, completion, and classification, while measurement challenges consist of fundamental questions touching upon definition, classification and operationalisation issues.
To address these, we selected datasets for which comparable alternative datasets are available globally and which are consistent throughout the years (see Table 3). Using data from the earliest available dataset in the years where otherwise no data is available constitutes a feasible alternative for datasets that show little temporal variation, such as the entrance points of parks (i.e., OS Open Greenspace). To provide an illustration of the comparability challenges, the OS AddressBase Premium data has been available since 2004; however, a consistent address classification has existed only since 2013. This means that while spatial information exists prior to 2013, the identification of residential or non-domestic functions will need to be inferred through the use of alternative classifications such as the Valuation Office Agency's Primary Description and Special Category (Scat) Codes. This introduces comparability issues as the inferred classifications are not identical to the existing land-use classes.  Furthermore, we trialled variables from the food environment domain, i.e., exposure to fast-food in conjunction with data from the BiB longitudinal cohort survey. Specifically, we looked at the association between fast-food exposure and childhood obesity. For this, we is comparable to the annual mean concentration in the UK at urban background monitoring sites in 2018 and just below the WHO guideline [73] of an annual mean of 10 micrograms per cubic metre. The streets that she would traverse would not be very walkable (0.84), feature little diversity of shops (0.06), and have a greenery level of 0.22, which is a relatively low green environment when compared to the highest values of places in Bradford. Samina would be close to public transport (182.99 m), but not very close to many other streets in the city (a 1419.43 street centrality), and closer to a fast-food outlet (561.89 m) than to a public park or garden (889.26 m). While this is only a single example, it demonstrates how powerful such a dataset can be when multiplied across a cohort-or indeed a city's population. Not only does it enable the researcher to detail each individual's susceptibility to ecological influences on their health (and their confounders), but it also allows a policy-maker to identify-through geographical mapping of the data-where interventions, such as reducing the sources of pollution, or improving access to green spaces, should be targeted most urgently.

Discussion
Constructing large-scale, longitudinal, individual-level environment exposure variables poses a series of challenges, which we divide into a) general data challenges and b) measurement challenges. General data challenges include difficulties around the availability and comparability of historic data. Availability and comparability challenges can range from differences in quality, spatial resolution, precision, completion, and classification, while measurement challenges consist of fundamental questions touching upon definition, classification and operationalisation issues.
To address these, we selected datasets for which comparable alternative datasets are available globally and which are consistent throughout the years (see Table 3). Using data from the earliest available dataset in the years where otherwise no data is available constitutes a feasible alternative for datasets that show little temporal variation, such as the entrance points of parks (i.e., OS Open Greenspace). To provide an illustration of the comparability challenges, the OS AddressBase Premium data has been available since 2004; however, a consistent address classification has existed only since 2013. This means that while spatial information exists prior to 2013, the identification of residential or non-domestic functions will need to be inferred through the use of alternative classifications such as the Valuation Office Agency's Primary Description and Special Category (Scat) Codes. This introduces comparability issues as the inferred classifications are not identical to the existing land-use classes. Furthermore, we trialled variables from the food environment domain, i.e., exposure to fast-food in conjunction with data from the BiB longitudinal cohort survey. Specifically, we looked at the association between fast-food exposure and childhood obesity. For this, we √ Int. J. Environ. Res. Public Health 2023, 20, x FOR PEER REVIEW is comparable to the annual mean concentration in the UK at urban backgroun ing sites in 2018 and just below the WHO guideline [73] of an annual mean crograms per cubic metre. The streets that she would traverse would not be ver (0.84), feature little diversity of shops (0.06), and have a greenery level of 0.22, relatively low green environment when compared to the highest values of plac ford. Samina would be close to public transport (182.99 m), but not very clos other streets in the city (a 1419.43 street centrality), and closer to a fast-food ou m) than to a public park or garden (889.26 m).
While this is only a single example, it demonstrates how powerful such a d be when multiplied across a cohort-or indeed a city's population. Not only doe the researcher to detail each individual's susceptibility to ecological influenc health (and their confounders), but it also allows a policy-maker to identify-th ographical mapping of the data-where interventions, such as reducing the pollution, or improving access to green spaces, should be targeted most urgent

Discussion
Constructing large-scale, longitudinal, individual-level environment expo ables poses a series of challenges, which we divide into a) general data challen measurement challenges. General data challenges include difficulties around bility and comparability of historic data. Availability and comparability chal range from differences in quality, spatial resolution, precision, completion, and tion, while measurement challenges consist of fundamental questions touching inition, classification and operationalisation issues.
To address these, we selected datasets for which comparable alternative d available globally and which are consistent throughout the years (see Table 3). from the earliest available dataset in the years where otherwise no data is availa tutes a feasible alternative for datasets that show little temporal variation, such trance points of parks (i.e., OS Open Greenspace). To provide an illustration of t rability challenges, the OS AddressBase Premium data has been available since ever, a consistent address classification has existed only since 2013. This means spatial information exists prior to 2013, the identification of residential or non functions will need to be inferred through the use of alternative classifications s Valuation Office Agency's Primary Description and Special Category (Scat) Cod troduces comparability issues as the inferred classifications are not identical to t land-use classes. is comparable to the annual mean concentration in the UK at urban back ing sites in 2018 and just below the WHO guideline [73] of an annua crograms per cubic metre. The streets that she would traverse would not (0.84), feature little diversity of shops (0.06), and have a greenery level o relatively low green environment when compared to the highest values ford. Samina would be close to public transport (182.99 m), but not ve other streets in the city (a 1419.43 street centrality), and closer to a fast-fo m) than to a public park or garden (889.26 m).
While this is only a single example, it demonstrates how powerful s be when multiplied across a cohort-or indeed a city's population. Not o the researcher to detail each individual's susceptibility to ecological in health (and their confounders), but it also allows a policy-maker to ident ographical mapping of the data-where interventions, such as reducin pollution, or improving access to green spaces, should be targeted most

Discussion
Constructing large-scale, longitudinal, individual-level environme ables poses a series of challenges, which we divide into a) general data c measurement challenges. General data challenges include difficulties ar bility and comparability of historic data. Availability and comparabilit range from differences in quality, spatial resolution, precision, completio tion, while measurement challenges consist of fundamental questions to inition, classification and operationalisation issues.
To address these, we selected datasets for which comparable altern available globally and which are consistent throughout the years (see Tab from the earliest available dataset in the years where otherwise no data is tutes a feasible alternative for datasets that show little temporal variation trance points of parks (i.e., OS Open Greenspace). To provide an illustrati rability challenges, the OS AddressBase Premium data has been available ever, a consistent address classification has existed only since 2013. This spatial information exists prior to 2013, the identification of residential functions will need to be inferred through the use of alternative classifica Valuation Office Agency's Primary Description and Special Category (Sca troduces comparability issues as the inferred classifications are not identi land-use classes. Furthermore, we trialled variables from the food environment dom to fast-food in conjunction with data from the BiB longitudinal cohort su we looked at the association between fast-food exposure and childhood ob √ Int. J. Environ. Res. Public Health 2023, 20, x FOR PEER REVIEW is comparable to the annual mean concentration in the UK a ing sites in 2018 and just below the WHO guideline [73] crograms per cubic metre. The streets that she would traver (0.84), feature little diversity of shops (0.06), and have a gre relatively low green environment when compared to the hi ford. Samina would be close to public transport (182.99 m other streets in the city (a 1419.43 street centrality), and clos m) than to a public park or garden (889.26 m).
While this is only a single example, it demonstrates ho be when multiplied across a cohort-or indeed a city's popu the researcher to detail each individual's susceptibility to health (and their confounders), but it also allows a policy-m ographical mapping of the data-where interventions, su pollution, or improving access to green spaces, should be ta

Discussion
Constructing large-scale, longitudinal, individual-leve ables poses a series of challenges, which we divide into a) g measurement challenges. General data challenges include bility and comparability of historic data. Availability and range from differences in quality, spatial resolution, precisi tion, while measurement challenges consist of fundamental inition, classification and operationalisation issues.
To address these, we selected datasets for which comp available globally and which are consistent throughout the from the earliest available dataset in the years where otherw tutes a feasible alternative for datasets that show little temp trance points of parks (i.e., OS Open Greenspace). To provid rability challenges, the OS AddressBase Premium data has b ever, a consistent address classification has existed only sinc spatial information exists prior to 2013, the identification o functions will need to be inferred through the use of alterna Valuation Office Agency's Primary Description and Special C troduces comparability issues as the inferred classifications a land-use classes. Furthermore, we trialled variables from the food envir to fast-food in conjunction with data from the BiB longitudi we looked at the association between fast-food exposure and Int. J. Environ. Res. Public Health 2023, 20, x FOR PEER REVIEW is comparable to the annual mean concentration in th ing sites in 2018 and just below the WHO guidelin crograms per cubic metre. The streets that she would (0.84), feature little diversity of shops (0.06), and hav relatively low green environment when compared to ford. Samina would be close to public transport (18 other streets in the city (a 1419.43 street centrality), an m) than to a public park or garden (889.26 m).
While this is only a single example, it demonstra be when multiplied across a cohort-or indeed a city' the researcher to detail each individual's susceptibi health (and their confounders), but it also allows a po ographical mapping of the data-where interventio pollution, or improving access to green spaces, shoul

Discussion
Constructing large-scale, longitudinal, individu ables poses a series of challenges, which we divide in measurement challenges. General data challenges in bility and comparability of historic data. Availabilit range from differences in quality, spatial resolution, p tion, while measurement challenges consist of fundam inition, classification and operationalisation issues.
To address these, we selected datasets for which available globally and which are consistent througho from the earliest available dataset in the years where tutes a feasible alternative for datasets that show littl trance points of parks (i.e., OS Open Greenspace). To rability challenges, the OS AddressBase Premium dat ever, a consistent address classification has existed on spatial information exists prior to 2013, the identific functions will need to be inferred through the use of Valuation Office Agency's Primary Description and S troduces comparability issues as the inferred classifica land-use classes. Furthermore, we trialled variables from the food to fast-food in conjunction with data from the BiB lon we looked at the association between fast-food exposu Int. J. Environ. Res. Public Health 2023, 20, x FOR PEER REVIEW is comparable to the annual mean concentratio ing sites in 2018 and just below the WHO gu crograms per cubic metre. The streets that she w (0.84), feature little diversity of shops (0.06), an relatively low green environment when compa ford. Samina would be close to public transp other streets in the city (a 1419.43 street centra m) than to a public park or garden (889.26 m).
While this is only a single example, it dem be when multiplied across a cohort-or indeed the researcher to detail each individual's susc health (and their confounders), but it also allow ographical mapping of the data-where inter pollution, or improving access to green spaces

Discussion
Constructing large-scale, longitudinal, in ables poses a series of challenges, which we di measurement challenges. General data challen bility and comparability of historic data. Ava range from differences in quality, spatial resol tion, while measurement challenges consist of inition, classification and operationalisation is To address these, we selected datasets for available globally and which are consistent thr from the earliest available dataset in the years w tutes a feasible alternative for datasets that sho trance points of parks (i.e., OS Open Greenspac rability challenges, the OS AddressBase Premiu ever, a consistent address classification has exi spatial information exists prior to 2013, the id functions will need to be inferred through the Valuation Office Agency's Primary Description troduces comparability issues as the inferred cl land-use classes. is comparable to the annual mean concentration in the UK at urban background monitoring sites in 2018 and just below the WHO guideline [73] of an annual mean of 10 micrograms per cubic metre. The streets that she would traverse would not be very walkable (0.84), feature little diversity of shops (0.06), and have a greenery level of 0.22, which is a relatively low green environment when compared to the highest values of places in Bradford. Samina would be close to public transport (182.99 m), but not very close to many other streets in the city (a 1419.43 street centrality), and closer to a fast-food outlet (561.89 m) than to a public park or garden (889.26 m). While this is only a single example, it demonstrates how powerful such a dataset can be when multiplied across a cohort-or indeed a city's population. Not only does it enable the researcher to detail each individual's susceptibility to ecological influences on their health (and their confounders), but it also allows a policy-maker to identify-through geographical mapping of the data-where interventions, such as reducing the sources of pollution, or improving access to green spaces, should be targeted most urgently.

Discussion
Constructing large-scale, longitudinal, individual-level environment exposure variables poses a series of challenges, which we divide into a) general data challenges and b) measurement challenges. General data challenges include difficulties around the availability and comparability of historic data. Availability and comparability challenges can range from differences in quality, spatial resolution, precision, completion, and classification, while measurement challenges consist of fundamental questions touching upon definition, classification and operationalisation issues.
To address these, we selected datasets for which comparable alternative datasets are available globally and which are consistent throughout the years (see Table 3). Using data from the earliest available dataset in the years where otherwise no data is available constitutes a feasible alternative for datasets that show little temporal variation, such as the entrance points of parks (i.e., OS Open Greenspace). To provide an illustration of the comparability challenges, the OS AddressBase Premium data has been available since 2004; however, a consistent address classification has existed only since 2013. This means that while spatial information exists prior to 2013, the identification of residential or non-domestic functions will need to be inferred through the use of alternative classifications such as the Valuation Office Agency's Primary Description and Special Category (Scat) Codes. This introduces comparability issues as the inferred classifications are not identical to the existing land-use classes. is comparable to the annual mean concentration in the UK at urban background monitoring sites in 2018 and just below the WHO guideline [73] of an annual mean of 10 micrograms per cubic metre. The streets that she would traverse would not be very walkable (0.84), feature little diversity of shops (0.06), and have a greenery level of 0.22, which is a relatively low green environment when compared to the highest values of places in Bradford. Samina would be close to public transport (182.99 m), but not very close to many other streets in the city (a 1419.43 street centrality), and closer to a fast-food outlet (561.89 m) than to a public park or garden (889.26 m). While this is only a single example, it demonstrates how powerful such a dataset can be when multiplied across a cohort-or indeed a city's population. Not only does it enable the researcher to detail each individual's susceptibility to ecological influences on their health (and their confounders), but it also allows a policy-maker to identify-through geographical mapping of the data-where interventions, such as reducing the sources of pollution, or improving access to green spaces, should be targeted most urgently.

Discussion
Constructing large-scale, longitudinal, individual-level environment exposure variables poses a series of challenges, which we divide into a) general data challenges and b) measurement challenges. General data challenges include difficulties around the availability and comparability of historic data. Availability and comparability challenges can range from differences in quality, spatial resolution, precision, completion, and classification, while measurement challenges consist of fundamental questions touching upon definition, classification and operationalisation issues.
To address these, we selected datasets for which comparable alternative datasets are available globally and which are consistent throughout the years (see Table 3). Using data from the earliest available dataset in the years where otherwise no data is available constitutes a feasible alternative for datasets that show little temporal variation, such as the entrance points of parks (i.e., OS Open Greenspace). To provide an illustration of the comparability challenges, the OS AddressBase Premium data has been available since 2004; however, a consistent address classification has existed only since 2013. This means that while spatial information exists prior to 2013, the identification of residential or non-domestic functions will need to be inferred through the use of alternative classifications such as the Valuation Office Agency's Primary Description and Special Category (Scat) Codes. This introduces comparability issues as the inferred classifications are not identical to the existing land-use classes. Furthermore, we trialled variables from the food environment domain, i.e., exposure to fast-food in conjunction with data from the BiB longitudinal cohort survey. Specifically, we looked at the association between fast-food exposure and childhood obesity. For this, we √ Int. J. Environ. Res. Public Health 2023, 20, x FOR PEER REVIEW is comparable to the annual mean concentration in the UK at urban backgroun ing sites in 2018 and just below the WHO guideline [73] of an annual mean crograms per cubic metre. The streets that she would traverse would not be ver (0.84), feature little diversity of shops (0.06), and have a greenery level of 0.22, relatively low green environment when compared to the highest values of plac ford. Samina would be close to public transport (182.99 m), but not very clos other streets in the city (a 1419.43 street centrality), and closer to a fast-food ou m) than to a public park or garden (889.26 m).
While this is only a single example, it demonstrates how powerful such a d be when multiplied across a cohort-or indeed a city's population. Not only doe the researcher to detail each individual's susceptibility to ecological influenc health (and their confounders), but it also allows a policy-maker to identify-th ographical mapping of the data-where interventions, such as reducing the pollution, or improving access to green spaces, should be targeted most urgent

Discussion
Constructing large-scale, longitudinal, individual-level environment expo ables poses a series of challenges, which we divide into a) general data challen measurement challenges. General data challenges include difficulties around bility and comparability of historic data. Availability and comparability chal range from differences in quality, spatial resolution, precision, completion, and tion, while measurement challenges consist of fundamental questions touching inition, classification and operationalisation issues.
To address these, we selected datasets for which comparable alternative d available globally and which are consistent throughout the years (see Table 3). from the earliest available dataset in the years where otherwise no data is availa tutes a feasible alternative for datasets that show little temporal variation, such trance points of parks (i.e., OS Open Greenspace). To provide an illustration of t rability challenges, the OS AddressBase Premium data has been available since ever, a consistent address classification has existed only since 2013. This means spatial information exists prior to 2013, the identification of residential or non functions will need to be inferred through the use of alternative classifications s Valuation Office Agency's Primary Description and Special Category (Scat) Cod troduces comparability issues as the inferred classifications are not identical to t land-use classes. is comparable to the annual mean concentration in the UK at urban back ing sites in 2018 and just below the WHO guideline [73] of an annua crograms per cubic metre. The streets that she would traverse would not (0.84), feature little diversity of shops (0.06), and have a greenery level o relatively low green environment when compared to the highest values ford. Samina would be close to public transport (182.99 m), but not ve other streets in the city (a 1419.43 street centrality), and closer to a fast-fo m) than to a public park or garden (889.26 m).
While this is only a single example, it demonstrates how powerful s be when multiplied across a cohort-or indeed a city's population. Not o the researcher to detail each individual's susceptibility to ecological in health (and their confounders), but it also allows a policy-maker to ident ographical mapping of the data-where interventions, such as reducin pollution, or improving access to green spaces, should be targeted most

Discussion
Constructing large-scale, longitudinal, individual-level environme ables poses a series of challenges, which we divide into a) general data c measurement challenges. General data challenges include difficulties ar bility and comparability of historic data. Availability and comparabilit range from differences in quality, spatial resolution, precision, completio tion, while measurement challenges consist of fundamental questions to inition, classification and operationalisation issues.
To address these, we selected datasets for which comparable altern available globally and which are consistent throughout the years (see Tab from the earliest available dataset in the years where otherwise no data is tutes a feasible alternative for datasets that show little temporal variation trance points of parks (i.e., OS Open Greenspace). To provide an illustrati rability challenges, the OS AddressBase Premium data has been available ever, a consistent address classification has existed only since 2013. This spatial information exists prior to 2013, the identification of residential functions will need to be inferred through the use of alternative classifica Valuation Office Agency's Primary Description and Special Category (Sca troduces comparability issues as the inferred classifications are not identi land-use classes. Furthermore, we trialled variables from the food environment dom to fast-food in conjunction with data from the BiB longitudinal cohort su we looked at the association between fast-food exposure and childhood ob Int. J. Environ. Res. Public Health 2023, 20, x FOR PEER REVIEW is comparable to the annual mean concentration in the UK at urba ing sites in 2018 and just below the WHO guideline [73] of an crograms per cubic metre. The streets that she would traverse wou (0.84), feature little diversity of shops (0.06), and have a greenery relatively low green environment when compared to the highest v ford. Samina would be close to public transport (182.99 m), but other streets in the city (a 1419.43 street centrality), and closer to a m) than to a public park or garden (889.26 m).
While this is only a single example, it demonstrates how pow be when multiplied across a cohort-or indeed a city's population. the researcher to detail each individual's susceptibility to ecolog health (and their confounders), but it also allows a policy-maker t ographical mapping of the data-where interventions, such as r pollution, or improving access to green spaces, should be targeted

Discussion
Constructing large-scale, longitudinal, individual-level envir ables poses a series of challenges, which we divide into a) genera measurement challenges. General data challenges include difficu bility and comparability of historic data. Availability and compa range from differences in quality, spatial resolution, precision, com tion, while measurement challenges consist of fundamental questi inition, classification and operationalisation issues.
To address these, we selected datasets for which comparable available globally and which are consistent throughout the years ( from the earliest available dataset in the years where otherwise no tutes a feasible alternative for datasets that show little temporal v trance points of parks (i.e., OS Open Greenspace). To provide an il rability challenges, the OS AddressBase Premium data has been av ever, a consistent address classification has existed only since 2013 spatial information exists prior to 2013, the identification of resid functions will need to be inferred through the use of alternative cl Valuation Office Agency's Primary Description and Special Catego troduces comparability issues as the inferred classifications are not land-use classes. Furthermore, we trialled variables from the food environmen to fast-food in conjunction with data from the BiB longitudinal coh we looked at the association between fast-food exposure and childh Int. J. Environ. Res. Public Health 2023, 20, x FOR PEER REVIEW is comparable to the annual mean concentration in the UK a ing sites in 2018 and just below the WHO guideline [73] crograms per cubic metre. The streets that she would traver (0.84), feature little diversity of shops (0.06), and have a gre relatively low green environment when compared to the hi ford. Samina would be close to public transport (182.99 m other streets in the city (a 1419.43 street centrality), and clos m) than to a public park or garden (889.26 m).
While this is only a single example, it demonstrates ho be when multiplied across a cohort-or indeed a city's popu the researcher to detail each individual's susceptibility to health (and their confounders), but it also allows a policy-m ographical mapping of the data-where interventions, su pollution, or improving access to green spaces, should be ta

Discussion
Constructing large-scale, longitudinal, individual-leve ables poses a series of challenges, which we divide into a) g measurement challenges. General data challenges include bility and comparability of historic data. Availability and range from differences in quality, spatial resolution, precisi tion, while measurement challenges consist of fundamental inition, classification and operationalisation issues.
To address these, we selected datasets for which comp available globally and which are consistent throughout the from the earliest available dataset in the years where otherw tutes a feasible alternative for datasets that show little temp trance points of parks (i.e., OS Open Greenspace). To provid rability challenges, the OS AddressBase Premium data has b ever, a consistent address classification has existed only sinc spatial information exists prior to 2013, the identification o functions will need to be inferred through the use of alterna Valuation Office Agency's Primary Description and Special C troduces comparability issues as the inferred classifications a land-use classes. Furthermore, we trialled variables from the food envir to fast-food in conjunction with data from the BiB longitudi we looked at the association between fast-food exposure and Int. J. Environ. Res. Public Health 2023, 20, x FOR PEER REVIEW is comparable to the annual mean concentration in th ing sites in 2018 and just below the WHO guidelin crograms per cubic metre. The streets that she would (0.84), feature little diversity of shops (0.06), and hav relatively low green environment when compared to ford. Samina would be close to public transport (18 other streets in the city (a 1419.43 street centrality), an m) than to a public park or garden (889.26 m).
While this is only a single example, it demonstra be when multiplied across a cohort-or indeed a city' the researcher to detail each individual's susceptibi health (and their confounders), but it also allows a po ographical mapping of the data-where interventio pollution, or improving access to green spaces, shoul

Discussion
Constructing large-scale, longitudinal, individu ables poses a series of challenges, which we divide in measurement challenges. General data challenges in bility and comparability of historic data. Availabilit range from differences in quality, spatial resolution, p tion, while measurement challenges consist of fundam inition, classification and operationalisation issues.
To address these, we selected datasets for which available globally and which are consistent througho from the earliest available dataset in the years where tutes a feasible alternative for datasets that show littl trance points of parks (i.e., OS Open Greenspace). To rability challenges, the OS AddressBase Premium dat ever, a consistent address classification has existed on spatial information exists prior to 2013, the identific functions will need to be inferred through the use of Valuation Office Agency's Primary Description and S troduces comparability issues as the inferred classifica land-use classes. Furthermore, we trialled variables from the food to fast-food in conjunction with data from the BiB lon we looked at the association between fast-food exposu Int. J. Environ. Res. Public Health 2023, 20, x FOR PEER REVIEW is comparable to the annual mean concentratio ing sites in 2018 and just below the WHO gu crograms per cubic metre. The streets that she w (0.84), feature little diversity of shops (0.06), an relatively low green environment when compa ford. Samina would be close to public transp other streets in the city (a 1419.43 street centra m) than to a public park or garden (889.26 m).
While this is only a single example, it dem be when multiplied across a cohort-or indeed the researcher to detail each individual's susc health (and their confounders), but it also allow ographical mapping of the data-where inter pollution, or improving access to green spaces

Discussion
Constructing large-scale, longitudinal, in ables poses a series of challenges, which we di measurement challenges. General data challen bility and comparability of historic data. Ava range from differences in quality, spatial resol tion, while measurement challenges consist of inition, classification and operationalisation is To address these, we selected datasets for available globally and which are consistent thr from the earliest available dataset in the years w tutes a feasible alternative for datasets that sho trance points of parks (i.e., OS Open Greenspac rability challenges, the OS AddressBase Premiu ever, a consistent address classification has exi spatial information exists prior to 2013, the id functions will need to be inferred through the Valuation Office Agency's Primary Description troduces comparability issues as the inferred cl land-use classes. is comparable to the annual mean concentration in the UK at urban background monitoring sites in 2018 and just below the WHO guideline [73] of an annual mean of 10 micrograms per cubic metre. The streets that she would traverse would not be very walkable (0.84), feature little diversity of shops (0.06), and have a greenery level of 0.22, which is a relatively low green environment when compared to the highest values of places in Bradford. Samina would be close to public transport (182.99 m), but not very close to many other streets in the city (a 1419.43 street centrality), and closer to a fast-food outlet (561.89 m) than to a public park or garden (889.26 m). While this is only a single example, it demonstrates how powerful such a dataset can be when multiplied across a cohort-or indeed a city's population. Not only does it enable the researcher to detail each individual's susceptibility to ecological influences on their health (and their confounders), but it also allows a policy-maker to identify-through geographical mapping of the data-where interventions, such as reducing the sources of pollution, or improving access to green spaces, should be targeted most urgently.

Discussion
Constructing large-scale, longitudinal, individual-level environment exposure variables poses a series of challenges, which we divide into a) general data challenges and b) measurement challenges. General data challenges include difficulties around the availability and comparability of historic data. Availability and comparability challenges can range from differences in quality, spatial resolution, precision, completion, and classification, while measurement challenges consist of fundamental questions touching upon definition, classification and operationalisation issues.
To address these, we selected datasets for which comparable alternative datasets are available globally and which are consistent throughout the years (see Table 3). Using data from the earliest available dataset in the years where otherwise no data is available constitutes a feasible alternative for datasets that show little temporal variation, such as the entrance points of parks (i.e., OS Open Greenspace). To provide an illustration of the comparability challenges, the OS AddressBase Premium data has been available since 2004; however, a consistent address classification has existed only since 2013. This means that while spatial information exists prior to 2013, the identification of residential or non-domestic functions will need to be inferred through the use of alternative classifications such as the Valuation Office Agency's Primary Description and Special Category (Scat) Codes. This introduces comparability issues as the inferred classifications are not identical to the existing land-use classes.  Furthermore, we trialled variables from the food environment domain, i.e., exposure to fast-food in conjunction with data from the BiB longitudinal cohort survey. Specifically, we looked at the association between fast-food exposure and childhood obesity. For this, we ': dataset is not available.

Conclusions
This paper outlined the theoretical and technical foundation of Connected Bradford's environmental exposure indicators. The dataset constitutes a unique information source providing high-resolution geospatial information on the exposure to and within the built environment for the entire population of Bradford. Our street and building-level information have been captured at a scale that enables the formulation of guidelines and spatial planning policies for the modification of the built environment-a critical gap in the current knowledge. This effort will enable pivotal research into the relationship and causal links between the built environment and health, informing planning and policy-making.
Moreover, it has the potential to serve as a template for nationwide replication. The recent WHO priorities for urban health report [1] emphasised the importance of building city-level evidence on urban environments and health outcomes in order to obtain "a clearer picture of the association between urban exposures and health across the life course". We have proposed here a method for doing so that overcomes some fundamental challenges of capturing precise, meaningful data at a scale that can improve the quality of evidence for research in this urgent policy domain.