Data on Healthy Food Accessibility in Amsterdam, the Netherlands

This data descriptor introduces data on healthy food supplied by supermarkets in the city of Amsterdam, The Netherlands. In addition to two neighborhood variables (i.e., share of autochthons and average housing values), the data comprises three street network-based accessibility measures derived from analyses using a geographic information system. Data are provided on a spatial micro-scale utilizing grid cells with a spatial resolution of 100 m. We explain how the data were collected and pre-processed, and how alternative analyses can be set up. To illustrate the use of the data, an example is provided using the R programming language.


Introduction
Spatial accessibility to healthy food is important for people's health [1].In that respect, supermarkets play an essential role by offering healthy and fresh foods at more competitive prices than smaller grocery stores or convenience stores [2].However, supermarket access is not constant, but varies significantly across cities, resulting in dietary inequalities across urban neighborhoods.The body of knowledge thus far suggests that particularly people residing in socially-distressed neighborhoods (i.e., having a low socioeconomic status) as well as those neighborhoods where predominantly ethnic minorities live have poorer spatial supermarket accessibility.Such areas are often denoted as "food deserts" [3].
While food deserts seem to be omnipresent in the U.S., evidence concerning their existence in Canadian or European cities is mixed and far from conclusive [1].Reasons for divergent findings include the applied methodology, which is mainly based on geographic information systems (GIS) to compute accessibility indicators, and the applied statistical models [4].Present studies are often conceptually simple, applying a single accessibility measure on a less detailed analytical scale (e.g., administrative units).Therefore, multidimensional accessibility indicators combining proximity to, and density and variety of, supermarkets are suggested [5][6][7].Although promoting a straightforward operationalization, food deserts are frequently identified by means of descriptive approaches (e.g., quartiles), disregarding that both accessibility and neighborhood characteristics are key for food desert mapping which calls for multivariate data clustering [4].Finally, to the best of our knowledge, the conducted studies do not make the underlying research data (e.g., primary data, secondary data, and derived measures) available to the public, even Data 2017, 2, 7 2 of 10 though the reproducibility of findings on which knowledge is built is imperative, and a fundamental aspect in scientific investigations.Brunsdon [8] critically highlights several benefits when methods and data repositories are shared.The benefits include clear documentation, transparency concerning pre-processing, and the possibility to validate results, to apply alternative analytical approaches, to serve as a basis for follow-up studies, etc.All these issues will ultimately lead to more reliable research.
This data descriptor addresses the aforementioned research gaps by describing in detail and sharing the data related to the research article Food Deserts?Healthy Food Access in Amsterdam [9].It describes both the data collection and the procedures used in pre-processing the data, and gives an overview of how the data can be used.Note that the interpretation of the results is given in the companion article.The provided data is not only relevant to map healthy food accessibility but is also of relevance for other studies dealing with, for example, the analyses of health behavior and can be linked to on-going area-based or register studies in Amsterdam.

Data Description
Table 1 summarizes key characteristics of the dataset.

ID Unique identifier
Version 1  1.0 1 In case the data will be updated, the version number will be changed.Older versions will be archived.

Study Area and Analysis Scale
The study area was the city of Amsterdam, The Netherlands.The city is located at 52 • 22 N, 4 • 53 O. Figure 1 shows the location of the study area.We selected Amsterdam because the health monitor [10] reported distinct differences in overweight and obesity prevalence.For example, 40% of the residents are overweight, and 75% of the adults do not consume the recommended amount of fruit and vegetables.Significant spatial variation in overweight prevalence exists as well.With 22% pronounced overweight, prevalence can be found in the central areas and the rates increase even further in the northern parts of Amsterdam.
In contrast to most food desert studies, we tried to circumvent methodological complications arising from the application of census areas (e.g., an uneven size).In order to go beyond administrative units, we overlaid the study area with a grid in which each cell had a spatial resolution of 100 × 100 m.Thus, information is available for 5242 cells in total.Note that the data provided here only includes cells where people reside; cells without a residential population were queried and excluded from further analyses.Furthermore, an ID (e.g., E1281N4931) introduced by Statistic Netherlands [11] was excluded from further analyses.Furthermore, an ID (e.g., E1281N4931) introduced by Statistic Netherlands [8] was attached to each cell, allowing a straightforward linkage with other administrative data.The grid cells are provided as an ESRI™ shapefile and R data object (see Section 4).

Supermarket Data
The initial search for all supermarket chains operating in The Netherlands was guided by an overview published in Wikipedia [9] and the newspaper Levensmiddelen Krant [10].The number of stores per supermarket chain located within the administrative unit of Amsterdam, including those within a buffer zone of two kilometers around the city, were collected.The consideration of a buffer zone was necessary to avoid edge effects for the accessibility measures.Due to theoretical considerations, organic supermarkets and "to go" stores were disregarded.Each company's webpage was queried to obtain the store addresses (i.e., the street name and the building number).A total of 144 supermarkets were identified during the data collection phase in November 2015; of them, 122 are located within the administrative area of Amsterdam.Table 2 provides some information about the stores.The Dutch cadastral data Basisregistraties Adressen en Gebouwen [11] and ArcGIS Online were then used to convert the individual store addresses into geographic coordinates, which were then projected onto the local coordinate system (i.e., EPSG code 28992).A detailed description of the projection is given in Section 4.  3.2.Data Sources and Pre-Processing

Supermarket Data
The initial search for all supermarket chains operating in The Netherlands was guided by an overview published in Wikipedia [12] and the newspaper Levensmiddelen Krant [13].The number of stores per supermarket chain located within the administrative unit of Amsterdam, including those within a buffer zone of two kilometers around the city, were collected.The consideration of a buffer zone was necessary to avoid edge effects for the accessibility measures.Due to theoretical considerations, organic supermarkets and "to go" stores were disregarded.Each company's webpage was queried to obtain the store addresses (i.e., the street name and the building number).A total of 144 supermarkets were identified during the data collection phase in November 2015; of them, 122 are located within the administrative area of Amsterdam.Table 2 provides some information about the stores.The Dutch cadastral data Basisregistraties Adressen en Gebouwen [14] and ArcGIS Online were then used to convert the individual store addresses into geographic coordinates, which were then projected onto the local coordinate system (i.e., EPSG code 28992).A detailed description of the projection is given in Section 4.

Accessibility Measures
For the accessibility measures, the coordinates of the centroid of each cell, serving as origin, were computed, while the supermarket locations served as destinations.The accessibility indicators were calculated on the basis of the street network provided by ESRI (version 2008) as input and a function iterated over all origins (cells).
Based on a literature review (e.g., [4,5]), three complementing supermarket accessibility measures were considered.The first indicator is based on the network distance (in meters) from cell i to the closest supermarket j of any chain (proximity measure).For the second measure, we first computed a street network buffer (service area) of 1000 m around each cell centroid, and then applied GIS-based point-in-polygon analyses to determine the number of available stores within this area (density measure).The threshold distance is based on a review of the literature and represents a 12-min walk for an adult [4].The final accessibility measure differentiates between supermarket chains, and represents the mean network distance (in meters) from each centroid i to the three nearest supermarkets j from k different chains (variety measure).The variety measure considers that different chains offer different products.
To derive these measures, the ArcGIS 10.3 network analyst extension was used with the centroids as incidents and the supermarket locations as facilities.For all analyses, all the routing restrictions were disabled (e.g., one-ways).

Neighborhood Data
We extracted neighborhood information for two variables for each cell from the raster dataset (vierkanten) maintained by Statistics Netherlands (www.cbs.nl)[11].
The second variable is the average housing value per cell, mimicking area-based socioeconomic status.The housing values are in €1000.To be considered a building, the construction must have at least 14 m 2 of living space, a toilet, a kitchen, etc.For each cell, the average housing value in the year 2011/12 is given.It represents the average housing value on the basis of the property register (Woningregister), which includes only houses that serve as main residences and in which no commercial activities take place.For a detailed discussion, see the original data description [11].

Data Usage and Application
To be independent of proprietary software and to facilitate easy use of the data, we compiled the dataset for the R language and environment for statistical computing [15].The R software is open-source, is available for all platforms, can be extended with a myriad of add-on packages, etc.The data is shipped as zipped shapefile and R data object.The R object represents a SpatialPolygonsDataFrame comprising both locational information (i.e., cells) and attached attribute information.To access and illustrate the usage of the data, this section provides R code and briefly shows an alternative analysis.
While the original research paper [9] applied contextual neural gas [16,17] to account for spatial autocorrelation [18], we complement this spatially explicit approach with a non-spatial analysis using widely applied self-organizing maps (SOMs) [19,20] (other options can be found in [21]).A SOM is an unsupervised artificial neural network for data clustering and visualization [19].However, the interpretation of the results is beyond the scope of this article and can be found elsewhere [9].Note that the supermarket location data can also be used to apply, for example, cluster detection algorithms [22,23].To facilitate quick access and use, the first code snippet loads the required R packages [24][25][26][27][28], sets the workspace, and unzips the data.
that the supermarket location data can also be used to apply, for example, cluster detection algorithms [20,21].To facilitate quick access and use, the first code snippet loads the required R packages [22][23][24][25][26], sets the workspace, and unzips the data.
that the supermarket location data can also be used to apply, for example, cluster detection algorithms [20,21].To facilitate quick access and use, the first code snippet loads the required R packages [22][23][24][25][26], sets the workspace, and unzips the data.
Next, the input variables for the SOM are selected.These variables are scaled before the SOM topology (10 × 8) is set up.This grid serves as input for the SOM algorithm.
> # select variables for som-based clustering > data.som<-as.matrix(data@data[,c("NATI", "HOUS", "PROX", "DENS", "VARI")]) > # scale selected variables > data.som.sc<-scale(data.som)> # create som topology > som.grid <-somgrid(xdim = 10, ydim = 8, topo = c("hexagonal")) > set.seed(20082014)> # train som > som.res <-som(data=data.som.sc,grid=som.grid,rlen=100, alpha = c(0.05,0.01), + keep.data=TRUE,n.hood="circular") > summary(som.res)som map of size 10x8 with a hexagonal topology.Training data included; dimension is 5242 by 5 Mean distance to the closest unit in the map: 0.3151507 The key strengths of SOMs are their rich visualization capabilities.The SOM training progress in Figure 2 shows considerable improvements after a few iterations and convergence after approximately 50 training cycles.Moreover, the U-matrix [17] can be plotted to investigate clusters, while the component planes are useful to discover correlations among the variables.Figure 3  The key strengths of SOMs are their rich visualization capabilities.The SOM training progress in Figure 2 shows considerable improvements after a few iterations and convergence after approximately 50 training cycles.Moreover, the U-matrix [19] be plotted to investigate clusters, while the component planes are useful to discover correlations among the variables.Figure 3 depicts two component planes as an example.
Next, the input variables for the SOM are selected.These variables are scaled before the SOM topology (10 × 8) is set up.This grid serves as input for the SOM algorithm.
> # select variables for som-based clustering > data.som<-as.matrix(data@data[,c("NATI", "HOUS", "PROX", "DENS", "VARI")]) > # scale selected variables > data.som.sc<-scale(data.som)> # create som topology > som.grid <-somgrid(xdim = 10, ydim = 8, topo = c("hexagonal")) > set.seed(20082014)> # train som > som.res <-som(data=data.som.sc,grid=som.grid,rlen=100, alpha = c(0.05,0.01), + keep.data=TRUE,n.hood="circular") > summary(som.res)som map of size 10x8 with a hexagonal topology.Training data included; dimension is 5242 by 5 Mean distance to the closest unit in the map: 0.3151507 The key strengths of SOMs are their rich visualization capabilities.The SOM training progress in Figure 2 shows considerable improvements after a few iterations and convergence after approximately 50 training cycles.Moreover, the U-matrix [17] can be plotted to investigate clusters, while the component planes are useful to discover correlations among the variables.Figure 3  In the next step, the grid of the SOM is clustered by means of the k-means algorithm.To analyze an appropriate number of clusters, we loop through between two and ten clusters, and for each clustering, the within cluster sum of squares is computed.Plotting these values shows an elbow at seven clusters, referring to a suitable solution.The clustering results are matched with the initial data and visualized as the geographic map shown in Figure 4.
> # plot the clustered component planes > plot(som.res,type="mapping", bgcol = brewer.pal(7,"Set1")[som.cluster.res],main = "Clusters") > add.cluster.boundaries(som.res,som.cluster.res)> # plot the clustered component planes > plot(som.res,type="mapping", bgcol = brewer.pal(7,"Set1")[som.cluster.res],main = "Clusters") > add.cluster.boundaries(som.res,som.cluster.res)> # plot clusters on a geographical map > cluster_details <-data.frame(id=data$ID,cluster=som.cluster.res[som.res$unit.classif])In the next step, the grid of the SOM is clustered by means of the k-means algorithm.To analyze an appropriate number of clusters, we loop through between two and ten clusters, and for each clustering, the within cluster sum of squares is computed.Plotting these values shows an elbow at seven clusters, referring to a suitable solution.In the next step, the grid of the SOM is clustered by means of the k-means algorithm.To analyze an appropriate number of clusters, we loop through between two and ten clusters, and for each clustering, the within cluster sum of squares is computed.Plotting these values shows an elbow at seven clusters, referring to a suitable solution.The clustering results are matched with the initial data and visualized as the geographic map shown in Figure 4.
> # plot the clustered component planes > plot(som.res,type="mapping", bgcol = brewer.pal(7,"Set1")[som.cluster.res],main = "Clusters") > add.cluster.boundaries(som.res,som.cluster.res)> # plot the clustered component planes > plot(som.res,type="mapping", bgcol = brewer.pal(7,"Set1")[som.cluster.res],main = "Clusters") > add.cluster.boundaries(som.res,som.cluster.res)> # plot clusters on a geographical map > cluster_details <-data.frame(id=data$ID,cluster=som.cluster.res[som.res$unit.classif]) The clustering results are matched with the initial data and visualized as the geographic map shown in Figure 4.In the next step, the grid of the SOM is clustered by means of the k-means algorithm.To analyze an appropriate number of clusters, we loop through between two and ten clusters, and for each clustering, the within cluster sum of squares is computed.Plotting these values shows an elbow at seven clusters, referring to a suitable solution.The clustering results are matched with the initial data and visualized as the geographic map shown in Figure 4.
> data@data <-data.frame(data@data,cluster_details[match(data@data[,"ID"], cluster_details[,"id"]), ]) > data$fcluster <-as.factor(data$cluster)> spplot(data, "fcluster", col="transparent", col.regions=brewer.pal(7,"Set1")) Exploring the descriptives of each cluster indicate that cluster 5 could be related to pockets of food deserts, even though the corresponding cells are exclusively located in the urban periphery.This challenges the classic interpretation of food deserts but confirms differences in spatial accessibility to healthy food supplied by supermarkets and area-based socioeconomic characteristics within the city of Amsterdam.In conclusion, no empirical evidence was found supporting the notion of pronounced inequalities in access to healthy food.Exploring the descriptives of each cluster indicate that cluster 5 could be related to pockets of food deserts, even though the corresponding cells are exclusively located in the urban periphery.This challenges the classic interpretation of food deserts but confirms differences in spatial accessibility to healthy food supplied by supermarkets and area-based socioeconomic characteristics within the city of Amsterdam.In conclusion, no empirical evidence was found supporting the notion of pronounced inequalities in access to healthy food.
to the closest supermarket from each cell (in meters) Density (DENS) Numeric, number of stores within a 1000 m street network buffer around each cell Variety (VARI) Numeric, mean distance to three supermarkets of three different chains from each cell (in meters) Ethnicity (NATI) Numeric, proportion of native Dutch within a cell in the year 2014 (converted to the following numeric values: 5 = 90%, 4 = 75%-90%, 3 = 60%-75%, 2 = 40%-60%, 1 = 40%) Housing (HOUS) Numeric, average housing price per cell in the year 2011/12 (in €1000) cell, allowing a straightforward linkage with other administrative data.The grid cells are provided as an ESRI™ shapefile and R data object (see Section 4).Data 2017, 2, 7 3 of 10

Figure 3 .
Figure 3. Component planes for the variables (a) proximity and (b) density.

Figure 3 .
Figure 3. Component planes for the variables (a) proximity and (b) density.

Figure 3 .
Figure 3. Component planes for the variables (a) proximity and (b) density.

Figure 4 .
Figure 4. Result of the SOM-based clustering.

Figure 4 .
Figure 4. Result of the SOM-based clustering.

Figure 4 .
Figure 4. Result of the SOM-based clustering.
Additional summary statistics of the variables are reported below.