A Conceptual Framework for the Assessment of Cumulative Exposure to Air Pollution at a Fine Spatial Scale

Many epidemiological studies examining long-term health effects of exposure to air pollutants have characterized exposure by the outdoor air concentrations at sites that may be distant to subjects’ residences at different points in time. The temporal and spatial mobility of subjects and the spatial scale of exposure assessment could thus lead to misclassification in the cumulative exposure estimation. This paper attempts to fill the gap regarding cumulative exposure assessment to air pollution at a fine spatial scale in epidemiological studies investigating long-term health effects. We propose a conceptual framework showing how major difficulties in cumulative long-term exposure assessment could be surmounted. We then illustrate this conceptual model on the case of exposure to NO2 following two steps: (i) retrospective reconstitution of NO2 concentrations at a fine spatial scale; and (ii) a novel approach to assigning the time-relevant exposure estimates at the census block level, using all available data on residential mobility throughout a 10- to 20-year period prior to that for which the health events are to be detected. Our conceptual framework is both flexible and convenient for the needs of different epidemiological study designs.


Introduction
The World Health Organization has estimated that deaths attributed to outdoor air pollution are predominantly due to ischemic heart diseases and strokes (80%), followed by chronic obstructive pulmonary diseases or acute lower respiratory infections (14%) and lung cancer (6%) [1].
Several original papers, reviews and meta-analyses have documented that mortality (including mortality from all-causes) and chronic diseases (especially cancer, cardiovascular and respiratory disease) are associated with long-term exposure to particulate air pollution-notably to particles with diameters of 10 µm or less (PM 10 ), and particles with diameters of 2.5 µm or less (PM 2.5 ) and their constituents [2][3][4][5], as well as to nitrogen dioxide (NO 2 ) [2,[4][5][6][7] and sulfur dioxide (SO 2 ) [2,[4][5][6][7][8]. Regarding all-cause mortality, these studies have revealed an increase in mortality with long-term exposure to elemental carbon (pooled estimate: +6% per 1 µg/m 3 ; CI95% (5%´7%)), NO 2 (pooled aspect results from long-term residential mobility patterns that describe the geographic mobility of the population during the study follow-up, when exposure is assessed over long periods. Because of the potential influence of residential mobility, its consideration in cumulative exposure assessment is a crucial component for all epidemiological study designs exploring the effects of long-term exposures.

The Purpose of This Paper
To our knowledge, no epidemiological study investigating the health effects of long-term exposure to air pollution has accounted for residential mobility in assessing cumulative exposure at a fine spatial scale. This paper is an attempt to fill this gap via the development of a conceptual model for the assessment of cumulative exposure to air pollution at a fine spatial scale. Our work is part of a project that aims to investigate the long-term effects of air pollution on breast cancer in Paris, France, using an ecological approach whereby all data (regarding outcome, exposure, confounders or effect modifiers) will be collected at the smallest geographical level currently available in France, namely census block level (known as "IRIS"-a French acronym for "blocks for incorporating statistical information").
The present paper comprises four sections: We present an overview of exposure assessment in the exploration of long-term effects in epidemiological studies. (ii) We propose a conceptual framework for the retrospective assessment of air pollution, including two components: (1) retrospective reconstitution of NO 2 concentrations at a fine spatial scale (NO 2 has been chosen as the index pollutant because its spatial variability is higher than in many other air pollutants); and (2) a novel approach to assigning time-relevant exposure estimates at the census block level, using all available data on residential mobility throughout the 10-to 20-year period prior to that for which the health events are to be detected.

(iii)
We propose an application of this framework using French data to illustrate preliminary results and to describe the feasibility of the different steps of our framework. (iv) We discuss the need for this conceptual framework, and its subsequent value.

Retrospective Air Pollution Assessment: An Overview
In this section, we present an overview of assessment approaches of long-term exposure to outdoor pollution that have been developed in epidemiological studies over the past decade. It should be noted that this paper is not intended as a systematic review of modeling approaches and geospatial methods, already available in a previous review [25].

Retrospective Reconstitution of Ambient Air Concentrations and Exposure Assignment
In epidemiological studies, long-term exposure has been characterized by outdoor concentrations at neighborhood or postal addresses whose levels were based on the exposure assignment methods at a central site monitor (or over nearest monitors), or on modeling approaches.
Several studies used only measurements from monitoring stations, some estimating the average of monitors within the study spatial unit [3,[12][13][14]23,26], while others used nearest monitor concentrations [2,17,[28][29][30], or interpolated values provided by fixed-site monitors [16,24]. Hystad et al. combined the spatial patterns detected by original satellite estimates and spatio-temporal patterns from fixed-site monitoring data to estimate ambient air concentrations of NO 2 for each year from 1975 to 1994 [23]. With a view to assessing exposure to vehicle emissions, Wei et al. used State NO x emission data obtained from the air database for 1990 (the earliest year available) to study the incidence of female breast cancer observed during the period 1986-2002 [31], whereas Hystad et al. used proximity to the road network to derive the number of years participants had resided within 50, 100 and 300 meters of a highway or major road during the 20-year exposure period [23]. To investigate the long-term effects of urban air pollution in a case-control study of lung cancer in Stockholm, Nyberg et al. estimated annual levels of SO 2 and NO x /NO 2 for each year between 1950 and 1990 using retrospective emission data and linear extrapolation and interpolation [8]. Using a spatial smoothing model and a Geographic Information System, Hart et al. estimated annual average levels of PM 10 , SO 2 , and NO 2 from 1985 through 2000 [2]. Several cohort studies used Land Use Regression (LUR) models to assess small-scale spatial levels of outdoor pollution and assigned exposure for all participants. Using this approach in a Canadian province, Hystad et al. calculated an average concentration of NO 2 for the 1975-1994 exposure period [23]. In the European ESCAPE study, the authors used LUR models to estimate concentrations of NO 2 , PM 10 and PM 2.5 in 20 European areas, with 20 sites per area [32,33]. Finally, more sophisticated air dispersion models (such as the CHIMERE chemistry-transport model)-thanks to increasingly available monitoring data and geostatistical modeling-were recently used to retrospectively model outdoor air pollution in France and reconstruct annual average concentrations of NO 2 , PM 10 and PM 2.5 from 1989 to 2008 at a fine spatial scale [27].
Hence, most environmental studies characterized exposure levels using outdoor concentration at sites that may be distant to the participants' residence location. The spatial resolution of the study units varies across these studies: city average level in the Harvard Six Cities Study [3,12,13] as opposed to zip code in the ACS [14][15][16], Medicare national cohort [19] and French Gazel studies [7]. Other spatial scales of exposure assignment were also used, such as district [30,34], enumeration area [35] and census tract [36]. On the other hand, while outcome data and confounder/adjustment variables are available at an individual level, most of these studies, described as being of a semi-ecological design, employ a group-level assessment of exposure.

Retrospective Reconstitution of Cumulative Exposure Levels
When the effects investigated stem from long-term cumulative exposure, one important issue that complicates exposure assessment is accounting for residential changes among the study population [37].
This difficulty of assembling large cohorts and following subjects throughout a long period of time has constrained authors to (i) exclude a large number of participants due to missing residential histories [23] or to select subjects who did not move [38]; or (ii) assign a unique annual exposure to PM 2.5 and NO 2 to the cohort members based on the last known home address [2], thus overlooking residential mobility. In some studies, different points in time were used as markers of average air pollution concentrations over the follow-up period, such as the concentration at the address at the study inclusion [39], participant addresses at follow-up [40], or a combination of both [41].
Examples of efforts to reconstitute individual trajectories over decades are the papers by Hystad et al., 2015, who assigned mean exposure levels to traffic-related air pollution from participant residential histories derived for each year over the 20-year period (1975-1994) [23]; and the French Gazel cohort, where the authors assigned annual air pollutant concentrations of PM 10 , PM 2.5 , NO 2 , O 3 , SO 2 , and benzene to participants on the basis of their five-digit zip code [7]. Such efforts are valuable, as recently shown by Andersen et al. who evidenced that effect estimates drawn from epidemiological studies were stronger when accounting for residential mobility than when not [42]. Another illustration of the return to taking residential mobility into account is proposed by Gan et al. who demonstrated that subjects whose exposure levels had been abated by moving to a less polluted area had a reduced risk of coronary heart disease in comparison to those whose exposure was more constant throughout the entire follow-up period [43].
This residential history issue is more difficult to overcome in ecological studies, where no individual information is available on residential mobility [31]. The following sections propose ways of overcoming this limitation.

A Conceptual Framework for Retrospective Assessment of Air Pollution
Our framework aims to address the two forms of misclassification described above. To illustrate this framework, we propose ways of retrospectively reconstituting NO 2 concentrations at the French census block level in the city of Paris (Section 3.1); assigning time-relevant exposure estimates at this spatial scale, using all available data on mobility throughout the 10-to 20-year follow-up period (Section 3.2) and application using data from Paris to illustrate the approach with preliminary results; and to assess the feasibility of the different steps of our framework (Section 4).

Retrospective Modeling of Pollutant Concentrations at a Fine Spatial Scale
To estimate the annual concentrations of NO 2 since the late 90s at the census block level, a series of three steps is required: Step 1: Modeling the annual averages of NO 2 concentrations between 2002 and 2012 at the census block level (the period during which we should have the data for all census blocks); Step 2: Assessing the spatial area representative of the air quality monitoring stations in the study area; Step 3: Modeling the temporal trend of: (i) annual averages between 2002 and 2012 at the census block level; (ii) and of daily NO 2 concentrations obtained from the monitoring stations, and subsequently reconstructing the annual averages before 2002 at the census block level with this data. Annual mean concentrations of NO 2 were estimated at a fine spatial scale (IRIS) throughout the 2002´2012 period by Airparif (the Paris metropolitan area air quality monitoring network: http://www.airparif.fr) [44].
Firstly, NO 2 background concentrations were determined by combining monitored NO 2 concentrations from monitoring stations and those modeled at a regional scale from the ESMERALDA inter-regional platform for air mapping and forecasting (www.esmeralda-web.fr). The ISATIS software was used to conduct geostatistical analysis for data assimilation. Secondly, NO 2 road traffic concentrations estimated from the STREET software model [45] were added to NO 2 background concentrations. The software evaluates the annual levels from roads according to traffic characteristics and the close environment, as well as weather conditions. Several types of input data were used, including point sources and road transport emissions and meteorological data (temperature, wind speed and direction, relative humidity, barometric pressure). More than 200 point sources were selected from the regional emission inventory. Emissions for road traffic were estimated using the regional traffic network and the COPERT III European database for the 2002-2006 period, and COPERT IV for the 2007-2012 period. Concerning meteorological data, the Mesoscale Meteorological model (MM5: www.mmm.ucar.edu/mm5) developed by the Division of the NCAR Earth System Laboratory (NESL) was used. The areas of direct influence of the axes and the decreasing concentrations away from the latter are taken into account. This decrease is estimated through measurements on sites influenced by road traffic.

Step 2: Assessing the Spatial Area Representative of the Air Quality Monitoring Stations
Using daily NO 2 concentrations measured by fixed monitoring stations (including background stations and traffic stations) located within the city of Paris and available over the 2002-2012 period, we identified the area for which the air quality monitoring stations provided good estimates of daily concentration variability.
The hierarchical agglomerative clustering (HAC) is the most commonly used approach as it provides intuitive similarity relationships between any one sample and the entire dataset. The HAC was chosen and applied for Paris to associate each census block with a background permanent monitoring station. This cluster analysis allowed grouping census blocks and stations into clusters (groups) on the basis of similarities within a class and dissimilarities between different clusters.
Each census block was then assigned by Airparif to the monitoring station (named the "index" monitor) best representing overall NO 2 air quality within the census block, using clustering methods [46].

Step 3: Reconstitution of the Retrospective Annual Averages Prior to 2002 at the Census Block Level
In this final step, we combine the concentrations measured by monitoring stations with those estimated at the census block level for the associated "index" monitor. First, we assess temporal trends of both the daily concentrations measured at the Paris monitoring stations and the annual concentrations estimated at census blocks during the study period (2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012). Thus, the NO 2 variability (daily at each "index monitoring" level, and annually at each census block level) will be modeled using time-series analysis.
Second, we suggest weighting the annual average for the census block using the coefficient resulting from the time-series analysis of its "index" monitor. In other words, the calculations of retrospective NO 2 concentrations have two components: a census block component corresponding to the NO 2 annual trend, and a local component "index" monitor. Therefore, we use the daily trend coefficient to weight the annual observations. Figure 1 describes the two main steps required to fully assess cumulative exposure over the disease's long latency period. This framework explains how residential mobility can be integrated as a correction factor to retrospectively estimate cumulative exposure at the census block level.

‚
First, we retrieve the information from the national census that describes the proportion of the residents of each census block that lived in another census block in the past. Under the hypothesis that people living in census block N in year j could have lived in another census block one year previously, one can combine exposure levels for the different places of residence (namely U, V, W and C, in our present example) into a single estimate.
Using a mobility matrix, constructed from the national census database, describing inhabitant movements intra-area and outside the study area (i.e., another municipality, department or region) from one year to a previous year, the model estimate's cumulative population exposure is an average measure between E N j and the weighted [E ip j´1q ] where i represents the census blocks of residence in the (j-1) year. In our example, i can be the census block U, V, W and C.
The (1) where: ‚ P N N * E Np j´1) characterizes the sedentary population; p N N is the probability that the population residing in census block N already lived in the same census block N in year (j-1); ‚ P uN *E up j´1q + P vN *E vp j´1q + P wN *E wp j´1q characterizes the intra´area movement of the population; p uN p vN and p wN are probabilities that the population residing in census block N lived, respectively, in the census block U, V or W in year (j-1) and E up j´1q , E vp j´1q and E wp j´1q are, respectively, the average exposure level of census block U, V or W in year (j-1) ‚ [P cN *E cp j´1q ] characterizes the population movement outside the study area;p CN is the probability that the population residing in census block N lived in location C (municipality or department or region) in year (j-1) E cp j´1q is the average exposure level of the location C in year (j-1) extracted from the meta-analysis of French studies conducted at the municipal and regional scales (e.g., Bentayeb et al., 2014).
From Equation (1), we deduct that a low level of mobility for census block N is quantified by a probability P N N , close to 1 (approximately 100% of inhabitants live in census block N in year j and in the year (j-1)) and as a consequence, others P uN , P vN , P wN and P cN will be near to 0 (and vice versa). Between years (j-1) and (j-2), Equation 1 is again applied for each new location, N, U, V, W and C to assess exposure by taking into account population mobility etc. across the latency duration of the disease of interest.
In order to generalize Equation (1) for a given latency period designated as L, the cumulative exposure of census block N over the L previous years, designated as Cumul [E Npj´Lq ], is defined according to Equation (2): Cumulr r rE Npj´Lq s s s " " " where the numerator is calculated from Equation (1).

Study Setting
The city of Paris (Ile-de-France region, capital France) is subdivided into 20 "arrondissements" and 992 census blocks for a total population of approximately 2,250,000 inhabitants. Designed by the French National Census Bureau (INSEE), the census block named "IRIS" constitutes the smallest census unit area whose aggregated data can be routinely used.
However, some data were available only at the arrondissement level when this analysis was undertaken (e.g., residential mobility data). Therefore, in this first application, some steps of our framework were carried out only at the arrondissement level, to illustrate the approach.  (Table 1). Higher NO 2 annual means were localized in the northern part of Paris. The spatio-temporal trend is given in Figure 2   The distribution of NO 2 annual concentrations across all census blocks expressed as (min; max) values are shown in Figure 4 where the X axis exposes the census blocks ordered by ascending value of the NO 2 census block mean between 2002-2012.
While the between-census block variability of annual average NO 2 concentrations was high (Table 1 and Figure 3), Figure 4 shows that the intra-census block variability was low. It also reveals that the difference between the maximum and the minimum of the annual average of NO 2 concentrations is stable over all census blocks, from those with low NO 2 concentrations (

Step 2: Assessment of the Spatial Representativity of the Air Quality Monitoring Stations
This second step reveals seven groups of census blocks and their associated monitoring stations according to their typology (urban, peri-urban and traffic monitoring stations). The dendrogram ( Figure 5) provides a visual summary of the clustering process, presenting a picture of the groups and their proximity.
Using the groups defined by the AHC, we chose the best representative air quality monitoring station for each census block on the basis of their spatial proximity (if we had two stations in the same cluster). Descriptive statistics of air quality monitoring stations are presented in Table 2.

Step 2: Assessment of the Spatial Representativity of the Air Quality Monitoring Stations
This second step reveals seven groups of census blocks and their associated monitoring stations according to their typology (urban, peri-urban and traffic monitoring stations). The dendrogram ( Figure 5) provides a visual summary of the clustering process, presenting a picture of the groups and their proximity.
Using the groups defined by the AHC, we chose the best representative air quality monitoring station for each census block on the basis of their spatial proximity (if we had two stations in the same cluster). Descriptive statistics of air quality monitoring stations are presented in Table 2.

Step 2: Assessment of the Spatial Representativity of the Air Quality Monitoring Stations
This second step reveals seven groups of census blocks and their associated monitoring stations according to their typology (urban, peri-urban and traffic monitoring stations). The dendrogram ( Figure 5) provides a visual summary of the clustering process, presenting a picture of the groups and their proximity.
Using the groups defined by the AHC, we chose the best representative air quality monitoring station for each census block on the basis of their spatial proximity (if we had two stations in the same cluster). Descriptive statistics of air quality monitoring stations are presented in Table 2.

Step 3. Illustration of Time Trends during the Study Period (2002-2012)
To give an illustration of the process followed in step 3, we chose the three representative stations of census block groups 4, 5 and 7 in order to describe the variability of daily concentrations measured by traffic and urban monitoring stations (named N2BONA, N2PA07, N2AUT). For each station, we selected three census blocks among all in the corresponding groups for which the station was representative of their NO2 daily variations (N2BONA: census block A, B, C; N2PA07: census block D, E, F and N2AUT: census block G, H, I). Figure 6 reveals that even if census blocks were classified in the same group (defined by step 2), the trends of NO2 annual concentrations differ between census blocks during the study period (2002-2012); one could have a linear trend whereas others could not be described by linear trends. For instance, the three census blocks of group 5 (A, B and C) have the same trend until 2005, whereas beyond this group each census block has its own trend. Inversely, in group 7, the three census blocks (D, E and F) exhibit similar trends over the study period. In group 4, the census blocks follow the trend during the beginning of the study period. However, after 2006, one census block tends to differ from the two others.
These observations confirm that within the same group of census blocks for which one monitoring station has been identified to best represent the NO2 daily variability, considering only the daily NO2 concentrations of the monitoring station (named "index monitoring") will provide retrospective good annual average estimates of NO2 concentrations at the census block level. A specific time-series analysis is necessary to assess the temporal trends and estimate the specific coefficient relating each "index monitoring" and each census block.

Step 3. Illustration of Time Trends during the Study Period (2002-2012)
To give an illustration of the process followed in step 3, we chose the three representative stations of census block groups 4, 5 and 7 in order to describe the variability of daily concentrations measured by traffic and urban monitoring stations (named N2BONA, N2PA07, N2AUT). For each station, we selected three census blocks among all in the corresponding groups for which the station was representative of their NO 2 daily variations (N2BONA: census block A, B, C; N2PA07: census block D, E, F and N2AUT: census block G, H, I). Figure 6 reveals that even if census blocks were classified in the same group (defined by step 2), the trends of NO 2 annual concentrations differ between census blocks during the study period (2002-2012); one could have a linear trend whereas others could not be described by linear trends. For instance, the three census blocks of group 5 (A, B and C) have the same trend until 2005, whereas beyond this group each census block has its own trend. Inversely, in group 7, the three census blocks (D, E and F) exhibit similar trends over the study period. In group 4, the census blocks follow the trend during the beginning of the study period. However, after 2006, one census block tends to differ from the two others.
These observations confirm that within the same group of census blocks for which one monitoring station has been identified to best represent the NO 2 daily variability, considering only the daily NO 2 concentrations of the monitoring station (named "index monitoring") will provide retrospective good annual average estimates of NO 2 concentrations at the census block level. A specific time-series analysis is necessary to assess the temporal trends and estimate the specific coefficient relating each "index monitoring" and each census block.

Estimation of the Cumulative Exposure Accounting for Residential Mobility
To take into account residential mobility (the PNN, PuN, PvN and PwN, notations in Section 3.2.1.); we used a database extracted from the 2006 National Census by INSEE (Institut National de la Statistique et des Etudes Economiques). The spatial distribution of the proportion of people living in 2006 since for five or ten years in the same census block in Paris is presented in Figure 7: in half of the census blocks, 60% and 40% of people living in 2006 in a given census block resided in the same place five and ten years before, respectively. Unfortunately, because INSEE currently provides data of residential moves within Paris at the arrondissement and not census block level, we applied our theoretical approach (described in Section 3.2.1) at this larger spatial scale for a five-year latency period (the L in our notations, Section 3.2.1.).
Using a mobility matrix (see Supplementary Materials Figure S1), constructed from the national census database, we determined for each arrondissement the number of inhabitants changing their arrondissement of residence over the last five years (the arrondissement where people lived five years earlier) and the origins of the inhabitants from another arrondissement. In our example, we only consider population movements inside the study area, ignoring at this stage movements "outside" (movements to or from other cities) (see the Matrix in Supplementary Materials Figure S1).

Estimation of the Cumulative Exposure Accounting for Residential Mobility
To take into account residential mobility (the P NN , P uN , P vN and P wN , notations in Section 3.2.); we used a database extracted from the 2006 National Census by INSEE (Institut National de la Statistique et des Etudes Economiques). The spatial distribution of the proportion of people living in 2006 since for five or ten years in the same census block in Paris is presented in Figure 7: in half of the census blocks, 60% and 40% of people living in 2006 in a given census block resided in the same place five and ten years before, respectively. Unfortunately, because INSEE currently provides data of residential moves within Paris at the arrondissement and not census block level, we applied our theoretical approach (described in Section 3.2) at this larger spatial scale for a five-year latency period (the L in our notations, Section 3.2.).
Using a mobility matrix (see Supplementary Materials Figure S1), constructed from the national census database, we determined for each arrondissement the number of inhabitants changing their arrondissement of residence over the last five years (the arrondissement where people lived five years earlier) and the origins of the inhabitants from another arrondissement. In our example, we only consider population movements inside the study area, ignoring at this stage movements "outside" (movements to or from other cities) (see the Matrix in Supplementary Materials Figure S1).   Table 3 gives the cumulative NO 2 exposures over five years by arrondissement with and without taking into account residential movements inside the study area. The arrondissements are classified in descending order of the degree of residential mobility. The results reveal that when including the mobility matrix in the estimation of cumulative NO 2 exposures, population exposure levels were always lower than when not taking into account the residential mobility, the difference increasing with the mobility degree. For example, comparing arrondissement number 1 (with the lowest mobility rate, equal to 8.8%) with arrondissement number 15 (the highest mobility rate, equal to 19%), the differences of NO 2 cumulative exposure estimates with and without taking into account residential mobility are equal to 3 and 6 µg/m 3 respectively. Table 3. Population average exposure levels to NO 2 over 5 years at the arrondissement level with and without considering residential mobility (arrondissements are ranked according to the intensity of the between-census mobility)  Table 3 shows the relative difference between these two exposure levels (with and without residential mobility). The relative difference across the 20 arrondissements varies between 2.7 to 5.6 µg/m 3 . It is this difference, expressed in µg/m 3 , which is attributed to residential mobility and may change the result of the epidemiological studies; it may amount to up to 13% of average exposures and is likely to be greater when we are able to proceed to this analysis at the census block level.

Discussion and Perspective
This framework is designed to overcome limitations regarding exposure assessment to air pollution at a fine spatial scale in epidemiologic studies investigating long-term health effects. The conceptual model summarized in Figure 8 comprises a set of three steps. This approach comes into its own when pollutant concentration variations between census blocks are greater than those within.
Using French data, the application of steps 1 and 2 of this conceptual framework reveal that specific and complex time-series analysis is necessary to the assessment of temporal trends and estimation of the specific coefficient relating to each "index monitoring" and each census block. The third step (which is sophisticated and elaborate, and not within the scope of this paper) must be specific to each study, its context and its design.
Moreover, the illustration of the estimation of cumulative exposure accounting for residential mobility at arrondissement level has revealed significant differences between estimations with and without residential mobility over a five-year period. However, the mobility matrix must be completed at the census block level in order to combine this approach with the retrospective constitution of NO 2 concentrations at the census block level. To overcome the fact that residential mobility data is not currently provided at the census block level, different apportioning solutions are available for the disaggregation of population data from arrondissement to census block level via the combination of several data sources, including topographic and land use databases.
A major limitation of our method is that it remains relatively cumbersome, labor-intensive, and computer-intensive, requiring extensive data inputs that are generally difficult to obtain. Nevertheless, this conceptual model strives to address the following two aspects.
Firstly, the retrospective construction integrates models assessing pollutant concentrations at 25 m 2 resolution scales, whereas up until now, the spatial resolution of previous approaches was limited to coarser spatial resolution levels: 10 km 2 [23], 2 km 2 [7,27], 1 km 2 [47], 200 m 2 [48] and 100 m 2 [8]. Other models with similar spatial performance can be used along the same lines. This conceptual framework was designed for ecological approaches, as in the study we are undertaking on breast cancer, but we are confident that the concept is adaptable to other study designs; for instance, examining annual air pollution concentrations of cohort participants' census blocks rather than zip codes [7]. We illustrated the conceptual model on NO 2 but the same rationale can be applied to other pollutants such as PM 10 or PM 2.5 after adapting input data (including background pollution measurements and monitoring station measurements) accordingly.
Secondly, we propose taking residential mobility into account in the cumulative exposure assessment (using the mobility coefficient described in our framework). This is a crucial point in considering a disease with a long latency period, such as cancer. Therefore, the value of this approach lies beyond ecological, cohort and case-control studies with information on residential history, and may also be useful for enabling more accurate reconstruction of cumulative exposure.
We have also identified further limitations to this framework, mainly relating to uncertainties due to (a) exposure assignment method and (b) cumulative exposure assessment.
First, uncertainty around retrospective modelled concentration stems from the spatio-temporal variability of the model input parameters which depend on dispersion factors (both meteorological and topographical characteristics) and emission inventories, as well as on the model formulation itself. The robustness of the dispersion model integrated in our framework has, however, been thoroughly evaluated and validated in previous studies [46,49,50]. Yet this strength also has disadvantages, since the atmospheric dispersion models are complex and require data-demanding software that may be difficult to implement-even impossible in some countries, due to lack of input data.
In addition, in areas with a small number of monitoring stations during the first years of the study period (such as in our case during the 1990s), the possibility of retrospective reconstitution of air pollution concentrations may be limited. Assessment of the extent of errors made for retrospective estimates of ambient air concentrations may be carried out for a limited number of pollutants using information from monitoring sites that existed for the same pollutants in the past.
A second limitation relates to the intensity of residential mobility. In areas where population size and/or socio-demographic make-up has changed markedly during the time span of the epidemiological study, the method we propose will carry substantial uncertainty that will be greater in census blocks with a high degree of mobility, in comparison to sedentary census blocks.  In the next stage, we envisage an extension of our framework by also considering the daily mobility associated with occupational mobility. An alternative perspective is to evaluate the impact of the error introduced by the multiple steps of the estimation process on retrospective reconstitution of air pollution concentrations estimated at the census block level.
To achieve this, we will apply the retrospective method to other years-those for which the dispersion model has already provided validated estimates at the census block level. For instance, using data from 2005 to 2012, we will estimate retrospective concentrations for the period 2002 to 2004. Then, we will quantify the margins of error inherent to our approach, by estimating the error between the concentrations retrospectively obtained, and the modelled NO 2 concentrations.

Conclusions
In order to explore the health effects of chronic exposures, we seek to combine several sources of data routinely collected and modeled so as to address the accuracy of cumulative estimates of NO 2 concentrations over periods of years and, based on these, on population exposure, taking into account residential mobility over time, its duration being characterized according to the latency delay for development of the disease. Our framework explains how to surmount two principal difficulties in cumulative long-term exposure assessment: retrospective reconstitution of past ambient air concentrations and consideration of residential mobility in assigning exposure levels to neighborhoods; and, as appropriate according to study design, to individuals. We have provided guidelines on how census database and geostatistical methods may inform an approach for correcting the above-described sources of cumulative exposure misclassification. The conceptual framework is flexible and convenient for the needs of different epidemiological study designs. As a domain of application for future work, it would be interesting to explore the impact of the above-described misclassification on assessing the relationship between long-term exposure and cancer risk.
Author Contributions: Séverine Deguen, Wahida Kihal-Talantikite and Cindy Padilla derived the original conceptual framework. Séverine Deguen, Wahida Kihal-Talantikite, Cindy Padilla and Denis Zmirou-Navier together developed the concept and applied it to the illustration used. All authors collaborated on writing and revising the paper.

Conflicts of Interest:
The authors declare no conflict of interest.