Next Article in Journal
New Application of Osteogenic Differentiation from HiPS Stem Cells for Evaluating the Osteogenic Potential of Nanomaterials in Dentistry
Previous Article in Journal
Association between Food Label Unawareness and Loss of Renal Function in Diabetes: A Cross-Sectional Study in South Korea
Open AccessArticle

Field Validation of Commercially Available Food Retailer Data in the Netherlands

1
Department of Epidemiology and Biostatistics, Amsterdam UMC, Vrije Universiteit Amsterdam, 1117 de Boelelaan, Amsterdam, The Netherlands
2
Upstream Team, Amsterdam UMC, 1117 de Boelelaan, The Netherlands
3
Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, 3584 CX Utrecht, The Netherlands
4
Faculty of Geosciences, Department of Human Geography and Spatial Planning, Utrecht University, 3508 TC Utrecht, The Netherlands
*
Author to whom correspondence should be addressed.
Int. J. Environ. Res. Public Health 2020, 17(6), 1946; https://doi.org/10.3390/ijerph17061946
Received: 20 February 2020 / Revised: 12 March 2020 / Accepted: 14 March 2020 / Published: 16 March 2020

Abstract

The aim of this study was to validate a Dutch commercial dataset containing information on the types and locations of food retailers against field audit data. Field validation of a commercial dataset (“Locatus”) was conducted in February 2019. Data on the location and classification of food retailers were collected through field audits in 152 streets from four urban and four rural neighborhoods in the Netherlands. The classification of food retailers included eight types of grocery stores (e.g., supermarkets, bakeries) and four types of food outlets (e.g., cafés, take away restaurants). The commercial dataset in the studied area listed 322 food retailers, whereas the field audit counted 315 food retailers. Overall, the commercially available data showed “good” to “excellent” agreement statistics (>0.71) with field audit data for all three levels of analysis (i.e., location, classification and both combined) and across urban as well as rural areas. The commercial dataset under study provided an accurate description of the measured food environment. Therefore, policymakers and researchers should feel confident in using this commercial dataset as a source of secondary data.
Keywords: validity; retail food environment; foodscape; street audit; ground-truthing validity; retail food environment; foodscape; street audit; ground-truthing

1. Introduction

The food environment plays an important role in shaping dietary habits, and consequently people’s health [1]. The food environment is defined as “the multiplicity of sites where food is displayed for purchase, and where it may also be consumed” [2], and may both enhance and inhibit healthy dietary behaviors through ubiquitous opportunities to purchase and consume large varieties of healthy and unhealthy foods [3]. Given the potential opportunities to improve population diets and health outcomes, there is increasing interest in studies on the influence of the food environment on dietary behaviors [4,5].
The majority of the studies that have examined the link between food environment and health-related outcomes have focused on the geographical availability or accessibility of food retailers in relation to where people live and/or work [5]. From a public health perspective, the type of food retailers that exist near residential areas and schools may be especially relevant to the youth. For example, a greater availability and proximity to unhealthy food retailers has been associated with obesity and other diet-related chronic diseases, such as type 2 diabetes and cardiovascular disease [6,7,8,9,10,11,12]. Likewise, greater distance to fast food/convenience shops and proximity to retailers selling healthier foods has been shown to be associated with healthier dietary habits and better health statuses [9,13,14,15]. However, the current body of research in this field is inconsistent [6,16,17,18,19,20,21,22,23,24,25,26,27,28], with many studies showing null or counterintuitive results.
A recent literature review has suggested various possible causes for the lack of consistent results found in food environment research. These potential causes include the methodological procedures employed, as well as the quality of the datasets used [29], which is further considered in the current study. Data on the geographical location of food retailers can be obtained from various sources. Primary data sources are gathered directly by research groups as systematic observation and notation of surrounding features, usually while walking through a specific area. Primary data sources are also known as field audits or ground-truthing, and are described as the “gold standard” in the field of food environment research [30,31]. Secondary data sources are collected by an external party to the research group and usually have purposes beyond health research objectives [32]. Examples of this include commercial retail data, yellow pages, and governmental repositories.
Even though secondary data sources are readily available and relatively accessible [29], they have some of the following limitations: they may provide incomplete information about food retailers; they may use a different classification of food retailers than those needed for health research purposes; they are often not freely available; and they may not be updated regularly [33,34,35]. Conversely, despite primary data sources being described as the “gold standard”, they are not without limitations. For example, since extensive fieldwork is required to obtain them, collecting primary data is costly and time-consuming [36]. Hence, studies exploring the relationships between food environment and health outcome have largely relied on secondary data sources [29], raising questions regarding their validity.
Previous studies have performed validation analyses of secondary data sources in various settings, countries, and populations [37,38], but this is still not a common practice in food environment research. Moreover, since food environment research is increasing, much work still remains to be done in order to identify which secondary food retailer data sources are most appropriate. To the best of our knowledge, no empirical evidence concerning the validity of secondary data sources exists in the Netherlands. Therefore, in order to help policymakers and researchers make informed decisions about which data sources to use, we aimed to test the validity of a Dutch commercial dataset containing information on geographical locations and types of food retailers against field audit data.

2. Materials and Methods

Since this validation study did not involve individual level data, it was exempt from approval by the Medical Ethics Committee of the Vrije Universiteit Medical Center in Amsterdam, the Netherlands.

2.1. Study Areas

The Netherlands is geographically divided into four regions [39]. The North Netherlands comprises the provinces Groningen, Friesland, and Drenthe; the East Netherlands comprises Overijssel, Gelderland, and Flevoland; the West Netherlands comprises Utrecht, North Holland, South Holland, and Zeeland; and the South Netherlands comprises North Brabant and Limburg.
To obtain an accurate representation of the Dutch food environment, we selected eight neighborhoods in total, two in each of the four above-mentioned regions, four being in an urban area, and four in a rural area. Both urban and rural areas were selected, as the validity of food environment data may differ across urbanization levels [38]. In this regard, urbanization level information of Dutch neighborhoods was accessed via the Dutch Centraal Bureau voor de Statistiek (CBS) in February 2019 [40]. CBS distinguishes different levels of urbanization based on the number of addresses per km2 [41]: very strongly urbanized areas (environmental address density of 2500 or more), strongly urbanized areas (environmental address density from 1500 to 2500), moderately urbanized areas (environmental address density from 1000 to 1500), low urbanized areas (environmental address density from 500 to 1000), and non-urban areas (environmental address density of less than 500) [40]. For the sake of this study, we considered very strongly and strongly urbanized areas as “urban” areas, and low and non-urbanized areas as “rural” areas.
Since the field audit was conducted on foot by one auditor, and due to research time constraints, the sampling was based on relatively small and similar sized areas that were accessible by public transport. This convenience sampling process resulted in selection of the following eight neighborhoods: Binnenstad-Noord (urban; Groningen), Oosterhaar (rural; Haren), Binnenstad (urban; Oldenzaal), Neede (rural; Berkelland), Binnenstad-Centrum (urban; ‘s-Hertogenbosch), Vliedberg (rural; Heusden), Anjeliersbuurt Noord/Zuid and Driehoekbuurt (urban; Amsterdam), and Ouderkerk aan de Amstel (rural; Ouder-Amstel).
In each of the predefined eight neighborhoods, 19 streets were selected for comparison with field audits, resulting in a total of 152 streets. The selection of streets was performed by a researcher that did not conduct the field audit. Within each selected neighborhood, streets that contained at least one food retailer were randomly selected from the Locatus dataset. Streets with no food retailer present were not included in the Locatus dataset. However, it is important that the correspondence between absence of food retailers according to both data sources could also be assessed. Therefore, streets with no food retailer present were selected from Google Maps. Streets were selected in such a way that 91 of these streets had at least one food retailer present, while the other 61 streets did not contain any food retailer. If a selected street crossed a neighborhood boundary, the given street was audited only until the limits of the selected neighborhood were reached.

2.2. Data Sources

Data on food retailers were obtained from Locatus [42], a Dutch company that collects information on different types of retail outlets for commercial purposes. Locatus covers all of the Netherlands and is widely used among retailers, policymakers, and researchers. Locatus collects information on location, type, size, and opening times of all retailers through systemic area scans, which are conducted by employees of Locatus via field audits. Food outlets in shopping areas are audited every year, while food outlets in scattered shopping areas are audited every two to three years, as the presence of retailers located in these areas tends to be more stable. When an audit takes place, all food retailers present in that neighborhood are assessed. In addition, office worker employees of Locatus conduct surveys and telephone interviews to receive updates about retailers, which makes it likely that changes in the food environments of both shopping and scattered shopping areas are noted within a year. In order to minimize the mismatch between the available data and field audit data in this study, the latest available data dating from July 2018 were used.
The field audit was conducted between 22 February and 2 March 2019. For the collection of the field audit data, the first author was instructed to scan the study area on foot in order to systematically observe and note the surrounding features. A standardized protocol was developed among the involved researchers to guide the field audit. The field audit proceeded as follows: (1) both sides of all selected streets were audited on foot; (2) the name, type, and location of each food retailer were recorded on a digital checklist; (3) establishments were classified based on external clues (e.g., names and signs); (4) in case of doubt, the auditor was instructed to enter the food retailer, consult the menu to identify main meals or products sold, or check the opening hours, assuming that, for instance, a full-service restaurant usually opens after 17:00. Establishments that had a sign indicating permanent closure or that appeared to be permanently closed were not considered to be present in the field. Importantly, the auditor was blinded to the information provided by the commercial dataset (only the street name was known), in order to prevent bias.

2.3. Field Audit Classification of Food Retailers

As shown in Table 1, two main categories (grocery stores and food outlets) were constructed and further divided into 12 subcategories of food retailers based on the food retailer classifications developed by Clary and Kestens [43] and Locatus’s definitions. These classifications were constructed prior to the field audit process.
Specifically, the grocery stores were classified and defined as follows:
  • Supermarkets: large food store chains selling a wide range of fresh, packaged, and frozen food products;
  • Local product shops: independent food stores selling a wide range of local and/or ethnic food products from EU and non-EU countries;
  • Fruit and vegetable stores: food stores selling mostly fruits, vegetables, and nuts;
  • Bakeries: food stores selling bread and other baked products;
  • Animal product stores: food stores selling mainly animal products, such as meat, dairy, or fish;
  • Natural product stores: food stores selling mostly superfoods, food supplements, homeopathic products, herbs, or coffee/tea;
  • Convenience stores: food stores selling a limited range of fresh and healthy food products, and primarily offering snack foods;
  • Confectionery stores: food stores specialized in selling a wide assortment of sweet products, including pastries, chocolates, candies, and ice-cream.
Similarly, the food outlets were given the following classifications based on the following characteristics:
  • Restaurants: chain or independent restaurants with à la carte menu or buffet, offering ready-to-eat foods with table service or the possibility to sit at a table;
  • Fast food restaurants: chain or independent restaurants with counter service only and limited seating options, selling mainly cheap ready-to-eat high energy density foods served a few minutes after ordering;
  • Take away restaurants: chain or independent restaurants where ready-to-eat food is delivered or picked up with no or limited seating options;
  • Cafés: chain or independent retailers offering alcoholic/non-alcoholic beverages and serving ready-to-eat sweet/salty snacks and meals, with the possibility to sit in and/or take away.
Although there are many retailers that have dual purposes (e.g., a fast food restaurant that also delivers meals), retailers were classified on the basis of their main purpose. Establishments whose business purpose was to sell beverages only, such as bars and liquor stores, were not considered to be food retailers in the present study. In addition, street vendors, such as food trucks and market stalls, were excluded from the classification system due to their itinerant nature.

2.4. Statistical Analyses

First, matching food retailers needed to be identified. In order do so, the Locatus and the audit datasets were merged into one file, and streets listed in the commercial dataset were compared to the field audit data. Three different matching levels were considered: “location”, “classification”, and both combined. For instance, a match between two food retailers was established when they were either present in the same location (i.e., according to street name and house numbering), had the same classification, or shared both a location and classification. In other words, food retailers listed in the commercial dataset were defined as true positives (TPs) if they were equally classified or found to be in the exact same location by the field audit. Non-matches were interpreted as false positives (FPs) if food retailers were listed in the commercial dataset but did not match with the field audit data, or vice versa (i.e., false negatives (FNs)). The number of “empty” streets in which food retailers were found neither by the commercial dataset nor by the field audit were referred as true negatives (TNs). Importantly, if there were spelling differences in the business name, discrepancies in the street name because a food retailer was located at a street junction, or errors in the house numbering, the retailer was still considered a match. This approach is known as “relaxed” matching criteria, and has been described as the most appropriate method when investigating the validity of a dataset, since the specific retailer name or exact address are of minor importance [43,44].
Next, agreement statistics such as sensitivity, specificity, positive predictive value (PPV), Cohen’s kappa, and concordance were estimated at the street level (see Table 2). Sensitivity reflected the ability of a data source to correctly capture food retailers that were actually present in the field, and was determined by the proportion of food retailers present in the field that were listed in the commercial dataset. Specificity was determined by the proportion of true negatives (i.e., empty streets) that were correctly classified as not presenting food retailers. PPV reflects the proportion of listed food retailers that were also present in the field (food retailers observed in the field that were not listed in the commercial dataset were not considered). Cohen’s kappa measured the agreement between field audit and Locatus data, taking into account agreements that occurred by chance. Concordance assessed the proportion of food retailers listed both in the commercial dataset and present in the field among all the food retailers present.
Finally, for each of the three matching levels (location, classification, and both combined), agreement statistics were calculated for all food retailers combined, all food retailers combined but stratified by urban and rural areas, and separate food retailer subcategories. The level of agreement was interpreted using the following criteria: <0.30 was considered “poor”, 0.31–0.50 was “fair”, 0.51–0.70 was “moderate”, from 0.71–0.90 was “good”, and >0.90 was “excellent” [45]. The dataset did not contain missing data. Statistical analyses were performed using SPSS version 25.0 (IBM Corp, 2017, Armonk, NY, USA).

3. Results

3.1. Descriptive Statistics

In the 152 selected streets, the Locatus dataset indicated 322 food retailers to be present, of which 276 were located in urban areas and 46 were located in rural areas. Of the 322 food retailers listed, 5.3% were supermarkets, 0.6% were fruit and vegetable stores, 3.1% were bakeries, 6.8% were animal product stores, 2.5% were natural product stores, 1.9% were convenience stores, 4% were confectionery stores, 42.5% were restaurants, 10.2% were fast food restaurants, 6.8% were take away restaurants, and 16.1% were cafés (no local product shops were listed in the Locatus dataset). Via the field audit, 315 food retailers were identified, of which 265 were located in urban areas and 50 were located in rural areas. Of the 315 food retailers analyzed, 4.8% were supermarkets, 1.9% were local product shops, 1% were fruit and vegetable stores, 2.2% were bakeries, 5.7% were animal product stores, 3.2% were natural product stores, 1% were convenience stores, 2.9% were confectionery stores, 48.3% were restaurants, 9.5% were fast food restaurants, 4.1% were take away restaurants, and 15.6% were cafés (see Table 3).
Of the 322 food retailers present in the Locatus dataset, 246 matched the food retailers found in the field, while 1 food retailer had a wrong address (1 animal product store). A total of 42 were wrongly classified (1 bakery, 8 animal product stores, 1 natural product stores, 3 convenience stores, 3 restaurants, 4 fast food restaurants, 8 take away restaurants, and 14 cafés). In most instances, there was a “close” mismatch, such as a fast food outlet classified as a take away outlet, or a café being classified as a restaurant. In some instances, a mismatch in store name indicated a replacement of the store/restaurant. There were 33 food outlets not found in the field (2 supermarkets, 2 bakeries, 6 confectionery stores, 13 restaurants, 2 fast food restaurants, 4 take away restaurants, and 4 cafés). In addition, 26 of the 315 food retailers found in the field were not listed in Locatus, of which 1 was a local product shop, 1 was a fruit and vegetable store, 3 were animal product stores, 1 was a natural product store, 1 was a confectionery store, 10 were restaurants, 1 was a take away restaurant, and 8 were cafés.

3.2. Agreement Statistics on the “Location” of Food Retailers

Overall, sensitivity of the location of food retailers was “excellent” (0.914), and PPV and concordance were “good” (0.897; 0.827) (see Table 4). Agreement statistics stratified by urbanization levels showed that in urban areas sensitivity was “excellent” (0.921), and PPV and concordance were “good” (0.887; 0.824). In rural areas, sensitivity was “good” (0.880), PPV was “excellent” (0.957), and concordance was “good” (0.846).
Agreement analyses were also conducted for each of the 12 food retailer subcategories. “Good” to “excellent” sensitivity was observed across all subcategories, except for fruit and vegetable stores (0.667). “Good” to “excellent” PPV was detected for all subcategories, except for confectionery stores (0.538). “Good” to “excellent” concordance was detected for all subcategories, except for fruit and vegetable stores (0.666) and confectionery stores (0.500).

3.3. Agreement Statistics on the “Classification” of Food Retailers

As shown in Table 5, overall PPV for the classification of food retailers was “good” (0.855). Agreement statistics stratified by urbanization level highlighted that PPV was “good” in both urban and in rural areas (0.849, 0.886).
Agreement analyses were also conducted for each of the 12 food retailer subcategories. “Good” to “excellent” PPV was observed across all subcategories, except for animal product stores (0.636), convenience stores (0.500), and take away restaurants (0.556).
For this analysis, only matching food retailers (retailers that were identified by both the commercial dataset and field audit) were considered. Indeed, retailers that were listed in Locatus but not found in the field, or present in the field but not in Locatus, as well as “empty” streets, could not be considered in this analysis because they had no counterparts to be compared with for their classification. In other words, only true positives and false positives were included, since the field audit could only confirm (true positive) or disconfirm (false positive) the classification given by Locatus. Therefore, the only agreement value that could be estimated was the PPV (= TP/TP + FP).

3.4. Agreement Statistics on Both “Location and Classification” of Food Retailers

Overall, sensitivity for both the locations and classification of food retailers was “excellent” (0.996), and PPV was “good” (0.854) (see Table 6). Similarly, agreement statistics stratified by urbanization level showed that in both urban and rural areas, sensitivity was “excellent” (0.995 and 1.000, respectively) and PPV was “good” (0.848 and 0.886, respectively).
Agreement analyses were also conducted for each of the 12 food retailer subcategories. “Excellent” sensitivity was observed across all subcategories. “Good” to “excellent” PPV was measured for all subcategories, except for animal product stores (0.619), convenience stores (0.500), take away restaurants (0.556), and cafés (0.708).
Again, only matching food retailers (retailers that were identified by both the commercial dataset and the field audit) were considered. Specificity and Kappa statistics could not be determined to assess agreement with the field audits, because the number of “true negatives” (i.e., streets with no food retailers) in this analysis was not applicable.

4. Discussion

In the present study, we aimed to evaluate the validity of a secondary data source (“Locatus”) containing information on the geographical locations of food retailers against a field audit. Our main results showed that these commercially available data had overall “good” to “excellent” agreement statistics as compared to the field audit data for all three levels of analysis (i.e., location, classification and both combined).
Previous studies have performed validation analysis of secondary data sources [37,38]. In general, commercial retail data and governmental repositories have been reported to have greater validity than other secondary data sources (e.g., yellow pages) [37], with most studies [38] reporting validity scores comparable with the results of the present study. Given the increasing interest in food environment research, transparency about the quality of data sources is of utmost importance. Although conclusions about the validity of secondary data sources may obviously differ due to the characteristics of the specific data under study, some of those differences may be explained by the methodological choices of the researchers and commercial parties. For example, mismatches between the secondary data and field audit data could be related to the temporal difference of the data collection between the two sources (e.g., some shops closed or changed names between the data collection of Locatus and the field audit) and the definition and interpretation of types of food retailers. In turn, insight into these methodological choices may facilitate comparisons and the potential harmonization of food environment datasets across settings and regions.
One methodological choice to be made by researchers pertains matching criteria. For instance, while several studies have used field audits as the “gold standard” [36,38,43,44,46,47,48,49,50] to explore the validity of secondary data sources, only some adopted “relaxed” matching criteria [38,43,44,47]. “Relaxed” matching criteria tolerates mismatches due to discrepancies in business names or slight imprecisions in location [43,44]. However, when considering every single mismatch due to business name and location error (e.g., retailer present on the right street but listed with wrong house number), an underestimation of the validity of a dataset may occur. As such, the choice of the matching criteria may to some extent explain differences in conclusions about the validity of secondary data sources [36,38,46].
Another methodological choice to be made by researchers is the area under study, thereby balancing precision and feasibility of the study. As the validity of food environment data may vary across urbanization levels, we explored possible discrepancies between urban and rural areas. Since the frequency of permanent closure of retailers may be higher in rural areas than urban areas, commercially available data have been described as possibly having greater validity in the latter than the former, since it is not able to capture these changes [51]. Nevertheless, we did not find considerable differences in agreement statistics between urban and rural areas. Few studies have compared the validity of secondary data sources across urbanization levels. Studies from the UK, validating different secondary data sources at varying levels of urbanization and across socio-economic levels, reported no notable differences across all study areas and fairly high agreement statistics, ranging from “moderate” to “excellent” [44,48]. In the US, even though studies have reported no marked differences across urban and rural areas, the magnitude of the validity scores has varied greatly between secondary data sources [38,46,50]. This suggests that the reason for the observed differences in terms of validity scores across urban and rural areas may be attributable to the data sources themselves, or to the geographic area of interest. For instance, since food retailers in rural areas are generally small and serve a limited number of local residents, some may choose not to be registered in commercial listings or other online secondary data directories [48]. Consequently, some (but not all) secondary data sources may be less able to correctly describe the food environment in a rural area.
Commercial parties’ methodological considerations for the classification of food retailers may also influence the validity and comparability of secondary data sources. To shed light on the usefulness of a commercial classification of food retailers for nutritional or public health research purposes, we also included agreement statistics on the “classification” level of analysis. While overall agreement on “classification” was relatively high, retailers such as animal product stores, convenience stores, and take away restaurants showed lower validity scores compared to other food retailer subcategories. These lower validity scores may be attributable to three aspects of classifying food retailers. Firstly, various retailers were combined into a food retailer category when constructing the field audit classification. For example, the food retailer category “animal product stores” included, among others, delicatessens. While we conceived delicatessens to be retailers selling mainly high-quality animal-based foods, such as cheese and salami, Locatus considered Italian and Polish shops also as delicatessens. Thus, a high number of false positives may have led to an underestimation of the PPV of a food retailer category. Secondly, misclassification may arise when retailers present multifaceted characteristics and thus they have no univocal definition. Studies from the UK, Canada, and the US [44,47,50] reported that convenience stores tend to have lower agreement statistics as compared to other food retailer subcategories. Convenience stores vary widely, and including (for instance) gas stations, pharmacies, and country stores. Additionally, in the Netherlands, convenience stores also offer a range of healthy and fresh products, unlike convenience stores in other countries. This variation may make them relatively difficult to recognize as such [29]. This is in contrast with the ease of classifying food retailers that present unique and clear characteristics (e.g., fast food chains) that may be easier to accurately detect. Thirdly, the multiple business features of some retailers (e.g., grocery stores that may offer the possibility to sit at a table for on-site consumption, or fast food restaurants that also offer take away service) may hinder the classification process [43], leading to some misclassification. In this case, reasoning in terms of what the main business purpose of a certain retailer is may facilitate the classification process.
Strengths of this study included its uniqueness, since to our knowledge this is the first study to assess the validity of commercially available data in the Netherlands. Next, the rigorous method used to separately calculate agreement statistics on “location”, “classification”, and both combined, have allowed light to be shed on the specific causes that may affect the validity of commercially available data. In addition, we examined a wide range of food retailers, offering an accurate listing of all common food retailers present in the study areas of interest. Lastly, during the field audit, the auditor was blinded to the commercial dataset in order to prevent the risk of being influenced in the data collection.
Nevertheless, some limitations need to be highlighted. First, it is worth noting that agreement statistics such as specificity and Kappa on the “location” level of analysis are affected by the low prevalence of food retailers and, in case of the urban areas, the very small number of true negatives (i.e., streets with no food retailers). Therefore, in these cases, the observed specificity and Kappa were excluded from further discussion. Second, due to research time constraints, a limited number of streets were purposively sampled in the eight neighborhoods that were characterized by being relatively small and easily accessible by public transport. However, given the coverage of both urban and rural areas, and the variety in number of food outlets per area, this selection is unlikely to have a major impact on the generalizability of the results. Third, since the field audits were conducted in February 2019 and the Locatus dataset was released in July 2018, some of the non-matching retailers may be attributed to the temporal mismatch between the two datasets. Retailers closing, opening, rebranding, and relocating during the seven-month time frame could have presumably increased the number of false positives and false negatives. Finally, retailers selling alcoholic beverages, establishments whose main business purpose was not selling food, and mobile vendors were not considered in the present study. Future validation studies should consider alternative sources of food and drinks in order to investigate whether they are correctly listed in secondary data sources.
Regardless of our findings, secondary data sources should always be used with caution and we would advise researchers to always validate their commercial data sources for use in health research. Notably, the use of field audits to validate secondary data sources has been described as being less suitable in large urban areas [34]. Collecting data via field audit observations in large geographic areas and in areas with high food retailer densities may be very labor intensive, and consequently not always feasible. Our study was characterized by a relatively small geographic area of interest and a limited number of streets, and thus no particular issues were encountered when conducting the field audit. If, however, secondary data sources cannot be complemented with extensive field work, alternative strategies such as combining at least two secondary data sources to improve the levels of accuracy [46], or the use of remote online-based techniques or street-viewing applications [52], should be considered in order to achieve an adequate alternative to field validations.

5. Conclusions

In this study, we assessed the validity of a secondary data source (“Locatus”) containing information on the geographical locations and types of food retailers against field audit data. In conclusion, overall agreement statistics across urban and rural areas ranged from “good” to “excellent” for all three levels of analysis (i.e., location, classification, and both combined). Therefore, policymakers and researchers should feel confident in using Locatus as a secondary source for assessing location and classification data of food retailers in the Netherlands. In addition, we highlighted a number of methodological considerations that may explain variation in the validity of secondary data sources, and that could be taken into account when comparing or harmonizing different data sources on food environments.

Author Contributions

Conceptualization and Supervision, M.G.M.P., J.L., and J.D.M.; Methodology and Writing—Review & Editing, C.C., M.G.M.P., J.L., and J.D.M.; Formal Analysis, Data Curation and Writing—Original Draft Preparation, C.C.; All authors have read and agreed to the published version of the manuscript.

Funding

M.G.M.P. has received a grant from the Brazilian higher education agency CNPq (National Council for Scientific and Technological Development) as part of the Science Without Borders Program (process number 233850/2014-7). J.D.M.’s work was funded by an NWO VENI grant on “Making the healthy choice easier – role of the local food environment” (grant number 451-17-032).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Swinburn, B.; Sacks, G.; Vandevijvere, S.; Kumanyika, S.; Lobstein, T.; Neal, B.; Barquera, S.; Friel, S.; Hawkes, C.; Kelly, B.; et al. INFORMAS (International Network for Food and Obesity/non-communicable diseases Research, Monitoring and Action Support): Overview and key principles. Obes. Rev. 2013, 14, 1–12. [Google Scholar] [CrossRef] [PubMed]
  2. Winson, A. Bringing Political Economy into the Debate on the Obesity Epidemic. Agric. Hum. Values 2004, 21, 299–312. [Google Scholar] [CrossRef]
  3. Booth, S.L.; Sallis, J.F.; Ritenbaugh, C.; Hill, J.O.; Birch, L.L.; Frank, L.D.; Glanz, K.; Himmelgreen, D.A.; Mudd, M.; Popkin, B.M.; et al. Environmental and Societal Factors Affect Food Choice and Physical Activity: Rationale, Influences, and Leverage Points. Nutr. Rev. 2001, 59, S21–S39. [Google Scholar] [CrossRef] [PubMed]
  4. Frieden, T.R. A Framework for Public Health Action: The Health Impact Pyramid. Am. J. Public Health 2010, 100, 590–595. [Google Scholar] [CrossRef] [PubMed]
  5. Lytle, L.A.; Sokol, R.L. Measures of the food environment: A systematic review of the field, 2007–2015. Health Place 2017, 44, 18–34. [Google Scholar] [CrossRef] [PubMed]
  6. Laska, M.N.; Hearst, M.O.; Forsyth, A.; Pasch, K.E.; Lytle, L. Neighbourhood food environments: Are they associated with adolescent dietary intake, food purchases and weight status. Public Health Nutr. 2010, 13, 1757–1763. [Google Scholar] [CrossRef]
  7. Davis, B.; Carpenter, C. Proximity of Fast-Food Restaurants to Schools and Adolescent Obesity. Am. J. Public Health 2009, 99, 505–510. [Google Scholar] [CrossRef]
  8. He, M.; Tucker, P.; Irwin, J.D.; Gilliland, J.; Larsen, K.; Hess, P. Obesogenic neighbourhoods: The impact of neighbourhood restaurants and convenience stores on adolescents’ food consumption behaviours. Public Health Nutr. 2012, 15, 2331–2339. [Google Scholar] [CrossRef]
  9. Powell, L.M.; Auld, M.C.; Chaloupka, F.J.; O’Malley, P.M.; Johnston, L.D. Associations Between Access to Food Stores and Adolescent Body Mass Index. Am. J. Prev. Med. 2007, 33, S301–S307. [Google Scholar] [CrossRef]
  10. Dengel, D.R.; Hearst, M.O.; Harmon, J.H.; Forsyth, A.; Lytle, L.A. Does the built environment relate to the metabolic syndrome in adolescents. Health Place 2009, 15, 946–951. [Google Scholar] [CrossRef]
  11. Virtanen, M.; Kivimäki, H.; Ervasti, J.; Oksanen, T.; Pentti, J.; Kouvonen, A.; Halonen, J.I.; Kivimäki, M.; Vahtera, J. Fast-food outlets and grocery stores near school and adolescents’ eating habits and overweight in Finland. Eur. J. Public Health 2015, 25, 650–655. [Google Scholar] [CrossRef] [PubMed]
  12. Shareck, M.; Lewis, D.; Smith, N.R.; Clary, C.; Cummins, S. Associations between home and school neighbourhood food environments and adolescents’ fast-food and sugar-sweetened beverage intakes: Findings from the Olympic Regeneration in East London (ORiEL) Study. Public Health Nutr. 2018, 21, 2842–2851. [Google Scholar] [CrossRef]
  13. Hearst, M.O.; Pasch, K.E.; Laska, M.N. Urban v. suburban perceptions of the neighbourhood food environment as correlates of adolescent food purchasing. Public Health Nutr. 2011, 15, 299–306. [Google Scholar] [CrossRef] [PubMed]
  14. Sharkey, J.R.; Dean, W.R.; Nalty, C.C.; Xu, J. Convenience stores are the key food environment influence on nutrients available from household food supplies in Texas Border Colonias. BMC Public Health 2013, 13, 45. [Google Scholar] [CrossRef] [PubMed]
  15. Lamichhane, A.P.; Mayer-Davis, E.J.; Puett, R.; Bottai, M.; Porter, D.E.; Liese, A.D. Associations of Built Food Environment with Dietary Intake among Youth with Diabetes. J. Nutr. Educ. Behav. 2012, 44, 217–224. [Google Scholar] [CrossRef]
  16. Pearce, J.; Hiscock, R.; Blakely, T.; Witten, K. The contextual effects of neighbourhood access to supermarkets and convenience stores on individual fruit and vegetable consumption. J. Epidemiol. Community Health 2008, 62, 198–201. [Google Scholar] [CrossRef]
  17. Lucan, S.C.; Hillier, A.; Schechter, C.B.; Glanz, K. Objective and Self-Reported Factors Associated With Food-Environment Perceptions and Fruit-And-Vegetable Consumption: A Multilevel Analysis. Prev. Chronic Dis. 2014, 11, 130324. [Google Scholar] [CrossRef]
  18. Richardson, A.S.; Boone-Heinonen, J.; Popkin, B.M.; Gordon-Larsen, P. Neighborhood fast food restaurants and fast food consumption: A national study. BMC Public Health 2011, 11, 543. [Google Scholar] [CrossRef]
  19. Hulst, A.V.; Barnett, T.A.; Gauvin, L.; Daniel, M.; Kestens, Y.; Bird, M.; Gray-Donald, K.; Lambert, M. Associations between children’s diets and features of their residential and school neighbourhood food environments. Can. J. Public Health 2012, 103, 48–54. [Google Scholar] [CrossRef]
  20. Block, J.P.; Christakis, N.A.; O’Malley, A.J.; Subramanian, S.V. Proximity to Food Establishments and Body Mass Index in the Framingham Heart Study Offspring Cohort Over 30 Years. Am. J. Epidemiol. 2011, 174, 1108–1114. [Google Scholar] [CrossRef]
  21. An, R.; Sturm, R. School and Residential Neighborhood Food Environment and Diet among California Youth. Am. J. Prev. Med. 2012, 42, 129–135. [Google Scholar] [CrossRef] [PubMed]
  22. Lee, H. The role of local food availability in explaining obesity risk among young school-aged children. Soc. Sci. Med. 2012, 74, 1193–1203. [Google Scholar] [CrossRef] [PubMed]
  23. Shier, V.; An, R.; Sturm, R. Is there a robust relationship between neighbourhood food environment and childhood obesity in the USA. Public Health 2012, 126, 723–730. [Google Scholar] [CrossRef] [PubMed]
  24. Timperio, A.; Ball, K.; Roberts, R.; Campbell, K.; Andrianopoulos, N.; Crawford, D. Children’s fruit and vegetable intake: Associations with the neighbourhood food environment. Prev. Med. 2008, 46, 331–335. [Google Scholar] [CrossRef]
  25. Leung, C.W.; Gregorich, S.E.; Laraia, B.A.; Kushi, L.H.; Yen, I.H. Measuring the neighborhood environment: Associations with young girls’ energy intake and expenditure in a cross-sectional study. Int. J. Behav. Nutr. Phys. Act. 2010, 7, 52. [Google Scholar] [CrossRef]
  26. Ahern, M.; Brown, C.; Dukas, S. A National Study of the Association between Food Environments and County-Level Health Outcomes. J. Rural Health 2011, 27, 367–379. [Google Scholar] [CrossRef]
  27. Kestens, Y.; Lebel, A.; Chaix, B.; Clary, C.; Daniel, M.; Pampalon, R.; Theriault, M.; Subramanian, S.V.P. Association between Activity Space Exposure to Food Establishments and Individual Risk of Overweight. PLoS ONE 2012, 7, e41418. [Google Scholar] [CrossRef]
  28. Jones-Smith, J.C.; Karter, A.J.; Warton, E.M.; Kelly, M.; Kersten, E.; Moffet, H.H.; Adler, N.; Schillinger, D.; Laraia, B.A. Obesity and the Food Environment: Income and Ethnicity Differences Among People With Diabetes. Diabetes Care 2013, 36, 2697–2705. [Google Scholar] [CrossRef]
  29. Lucan, S.C. Concerning limitations of food environment research: A narrative review and commentary framed around obesity and diet-related diseases in youth. J. Acad. Nutr. Diet. 2015, 115, 205–212. [Google Scholar] [CrossRef]
  30. Glanz, K. Measuring food environments: A historical perspective. Am. J. Prev. Med. 2009, 36, S93–S98. [Google Scholar] [CrossRef]
  31. Sharkey, J.R. Measuring potential access to food stores and food-service places in rural areas in the US. Am. J. Prev. Med. 2009, 36, S151–S155. [Google Scholar] [CrossRef] [PubMed]
  32. Forsyth, A.; Lytle, L.; Van Riper, D. Finding food: Issues and challenges in using Geographic Information Systems to measure food access. J. Transp. Land Use 2010, 3, 43. [Google Scholar] [PubMed]
  33. Brownson, R.C.; Hoehner, C.M.; Day, K.; Forsyth, A.; Sallis, J.F. Measuring the built environment for physical activity: State of the science. Am. J. Prev. Med. 2009, 36, S99–S123. [Google Scholar] [CrossRef] [PubMed]
  34. Lake, A.A.; Burgoine, T.; Greenhalgh, F.; Stamp, E.; Tyrrell, R. The foodscape: Classification and field validation of secondary data sources. Health Place 2010, 16, 666–673. [Google Scholar] [CrossRef]
  35. Wang, M.C.; Gonzalez, A.A.; Ritchie, L.D.; Winkleby, M.A. The neighborhood food environment: Sources of historical data on retail food stores. Int. J. Behav. Nutr. Phys. Act. 2006, 3, 15. [Google Scholar] [CrossRef]
  36. Powell, L.M.; Han, E.; Zenk, S.N.; Khan, T.; Quinn, C.M.; Gibbs, K.P.; Pugach, O.; Barker, D.C.; Resnick, E.A.; Myllyluoma, J.; et al. Field validation of secondary commercial data sources on the retail food outlet environment in the US. Health Place 2011, 17, 1122–1131. [Google Scholar] [CrossRef]
  37. Fleischhacker, S.E.; Evenson, K.R.; Sharkey, J.; Pitts, S.B.J.; Rodriguez, D.A. Validity of secondary retail food outlet data: A systematic review. Am. J. Prev. Med. 2013, 45, 462–473. [Google Scholar] [CrossRef]
  38. Lebel, A.; Daepp, M.I.G.; Block, J.P.; Walker, R.; Lalonde, B.; Kestens, Y.; Subramanian, S.V. Quantifying the foodscape: A systematic review and meta-analysis of the validity of commercially available business data. PLoS ONE 2017, 12, e0174417. [Google Scholar] [CrossRef]
  39. European Commission. Regions in the European Union. Nomenclature of Territorial Units for Statistics NUTS 2010/EU-27. 2011. Available online: https://ec.europa.eu/eurostat/documents/3859598/5916917/KS-RA-11-011-EN.PDF (accessed on 1 March 2019).
  40. Centraal Bureau Voor de Statistiek. Kerncijfers Wijken en Buurten 2018. 2018. Available online: https://www.cbs.nl/nl-nl/maatwerk/2018/30/kerncijfers-wijken-en-buurten-2018 (accessed on 1 March 2019).
  41. Den, C.D.; Van, H.D.S.; Vliegen, J.M. A new measure for degree of urbanization: The address density of the surrounding area. Maandstat Bevolk. 1992, 40, 14–27. [Google Scholar]
  42. Locatus. Retail data. Available online: https://locatus.com/en/retail-data/ (accessed on 1 March 2019).
  43. Clary, C.M.; Kestens, Y. Field validation of secondary data sources: A novel measure of representativity applied to a Canadian food outlet database. Int. J. Behav. Nutr. Phys. Act. 2013, 10, 77. [Google Scholar] [CrossRef]
  44. Wilkins, E.L.; Radley, D.; Morris, M.A.; Griffiths, C. Examining the validity and utility of two secondary sources of food environment data against street audits in England. Nutr. J. 2017, 16, 82. [Google Scholar] [CrossRef] [PubMed]
  45. Janse, A.; Gemke, R.; Uiterwaal, C.; Tweel, I.V.D.; Kimpen, J.; Sinnema, G. Quality of life: Patients and doctors don’t always agree: A meta-analysis. J. Clin. Epidemiol. 2004, 57, 653–661. [Google Scholar] [CrossRef] [PubMed]
  46. Liese, A.D.; Colabianchi, N.; Lamichhane, A.P.; Barnes, T.L.; Hibbert, J.D.; Porter, D.E.; Nichols, M.D.; Lawson, A.B. Validation of 3 food outlet databases: Completeness and geospatial accuracy in rural and urban food environments. Am. J. Epidemiol. 2010, 172, 1324–1333. [Google Scholar] [CrossRef] [PubMed]
  47. Paquet, C.; Daniel, M.; Kestens, Y.; Leger, K.; Gauvin, L. Field validation of listings of food stores and commercial physical activity establishments from secondary data. Int. J. Behav. Nutr. Phys. Act. 2008, 5, 58. [Google Scholar] [CrossRef]
  48. Lake, A.A.; Burgoine, T.; Stamp, E.; Grieve, R. The foodscape: Classification and field validation of secondary data sources across urban/rural and socio-economic classifications in England. Int. J. Behav. Nutr. Phys. Act. 2012, 9, 37. [Google Scholar] [CrossRef]
  49. Toft, U.; Erbs-Maibing, P.; Glümer, C. Identifying fast-food restaurants using a central register as a measure of the food environment. Scand. J. Public Health 2011, 39, 864–869. [Google Scholar] [CrossRef]
  50. Fleischhacker, S.E.; Rodriguez, D.A.; Evenson, K.R.; Henley, A.; Gizlice, Z.; Soto, D.; Ramachandran, G. Evidence for validity of five secondary data sources for enumerating retail food outlets in seven American Indian communities in North Carolina. Int. J. Behav. Nutr. Phys. Act. 2012, 9, 137. [Google Scholar] [CrossRef]
  51. Gustafson, A.A.; Lewis, S.; Wilson, C.; Jilcott-Pitts, S. Validation of food store environment secondary data source and the role of neighborhood deprivation in Appalachia, Kentucky. BMC Public Health 2012, 12, 688. [Google Scholar] [CrossRef]
  52. Charreire, H.; Mackenbach, J.; Ouasti, M.; Lakerveld, J.; Compernolle, S.; Ben-Rebah, M.; Mckee, M.; Brug, J.; Rutter, H.; Oppert, J.M. Using remote sensing to define environmental characteristics related to physical activity and dietary behaviours: A systematic review (the SPOTLIGHT project). Health Place 2014, 25, 1–9. [Google Scholar] [CrossRef]
Table 1. Field audit-derived classification of food retailers based on Locatus’s definitions.
Table 1. Field audit-derived classification of food retailers based on Locatus’s definitions.
Retail CategoryField Audit SubcategoriesLocatus’s Categories
Grocery storesSupermarketsSupermarkets
Local product shopsToko, foreign country shops (others)
Fruit and vegetable stores Vegetable/fruit stores
BakeriesBakeries
Animal product stores Cheese stores, poultry stores, butcheries, delicatessen, fish stores
Natural product storesHealth food stores, coffee/tea stores, nut stores
Convenience storesMinimarkets, night shops
Confectionery storesPastry stores, chocolate stores, ice-cream saloons, candy stores
Food outletsRestaurantsFull-service restaurants, café-restaurants, pancake restaurants, hotel-restaurants
Fast food restaurantsFast food restaurants, grillroom/shoarma/pita places
Take away restaurantsDelivery/take away outlets
CafésCoffee houses, lunchrooms
Table 2. Measures of validity of commercially available data as compared to field audit data.
Table 2. Measures of validity of commercially available data as compared to field audit data.
Field Audit DataValidity Score
PresentAbsentSensitivity   T P T P + F N  
Commercially available dataPresentTPFPSpecificity   T N T N + F P  
AbsentFNTNPPV   T P T P + F P  
Kappa   p o p e 1 p e  
Concordance   T P T P + F P + F N  
TP, true positive; FP, false positive; FN, false negative; TN, true negative; PPV, positive predictive value; po, observed agreement = T P + T N T P + F P + F N + T N   ; pe, expected agreement = ( T P + F P T P + F P + F N + T N × T P + F N T P + F P + F N + T N ) + ( F N + T N T P + F P + F N + T N × F P + T N T P + F P + F N + T N ) .
Table 3. Descriptive statistics derived by comparing the Locatus data against the field audit data.
Table 3. Descriptive statistics derived by comparing the Locatus data against the field audit data.
Category and SubcategoryNo. of Food Retailers Listed in Locatus (Collected Until July 2018)No. of Food Retailers Found in the Field (Collected between Feb 22 and March 2, 2019)Matching *Non-Matching
Error in Location *Error in Classification *Error in Both Location and Classification *Not Found in the Field *Found in the Field But Not Listed
Total32231524614203326
Urbanization Urban27626520713703120
Rural46503905026
Grocery stores (N)Supermarkets17151500020
Local product shops06000001
Fruit and vegetable stores23200001
Bakeries107701020
Animal product stores22181318003
Natural product stores810701001
Convenience stores63303000
Confectionery stores139700061
Food outlets (N)Restaurants1371521210301310
Fast food restaurants33302704020
Take away restaurants22131008041
Cafés524934014048
Note that not all numbers add up, as some food retailers found in the field but not listed by Locatus were actually listed by Locatus as another category of food outlet (e.g., of the six local product shops found in the field that were not listed by Locatus, only one was not listed at all by Locatus, while another five were listed by Locatus but not as local product shops—three were listed as convenience stores and two as animal product stores). * Frequency and percentage of food retailers listed in Locatus that matched or did not match the food retailers ascertained in the field. Frequency and percentage of food retailers found in the field that were not listed in Locatus. Urbanization levels as defined by the Centraal Bureau voor de Statistiek (CBS).
Table 4. Agreement statistics on “location” of food retailers for the Locatus dataset.
Table 4. Agreement statistics on “location” of food retailers for the Locatus dataset.
By CategoryBy SubcategoryTP *FNFPTNLocatus Dataset
SensitivitySpecificityPPVKappaConcordance
Overall2882733610.9140.6490.8970.5760.827
UrbanizationUrban244213130.9210.0880.8870.0100.824
Rural4462580.8800.9670.9570.8520.846
Grocery storesSupermarkets1502611.0000.9680.8820.9210.882
Local product shops01061-1.000---
Fruit and vegetable stores210610.6671.0001.0000.7920.666
Bakeries802611.0000.9680.8000.8730.800
Animal product stores2140610.8401.0001.0000.8820.840
Natural product stores810610.8891.0001.0000.9330.888
Convenience stores600611.0001.0001.0001.0001.000
Confectionery stores716610.8750.9100.5380.6160.500
Food outletsRestaurants1241013610.9250.8240.9050.7570.843
Fast food restaurants3102611.0000.9680.9390.9530.939
Take away restaurants1814610.9470.9380.8180.8390.782
Cafés4884610.8570.9380.9230.8000.800
TP, true positive; FN, false negative; FP, false positive; TN, true negative; PPV, positive predictive value. * Number of stores correctly located (246) and number of stores wrongly classified (42) but found to be in the correct location.
Table 5. Agreement statistics on “classification” of food retailers for the Locatus dataset.
Table 5. Agreement statistics on “classification” of food retailers for the Locatus dataset.
By CategoryBy SubcategoryTP *FPLocatus Dataset
PPV
Overall247420.855
UrbanizationUrban208370.849
Rural3950.886
Grocery storesSupermarkets1501.000
Local product shops---
Fruit and vegetable stores201.000
Bakeries710.875
Animal product stores1480.636
Natural product stores710.875
Convenience stores330.500
Confectionery stores701.000
Food outletsRestaurants12130.976
Fast food restaurants2740.871
Take away restaurants1080.556
Cafés34140.708
TP, true positive; FP, false positive; PPV, positive predictive value. * Number of stores correctly classified (246) and number of stores wrongly located (1) but correctly classified.
Table 6. Agreement statistics on both the “location and classification” of food retailers for the Locatus dataset.
Table 6. Agreement statistics on both the “location and classification” of food retailers for the Locatus dataset.
By CategoryBy SubcategoryTP *FNFPLocatus Dataset
SensitivityPPVConcordance
Overall2461420.9960.8540.851
UrbanizationUrban2071370.9950.8480.845
Rural39051.0000.8860.886
Grocery storesSupermarkets15001.0001.0001.000
Local product shops------
Fruit and vegetable stores2001.0001.0001.000
Bakeries7011.0000.8750.875
Animal product stores13180.9290.6190.590
Natural product stores7011.0000.8750.875
Convenience stores3031.0000.5000.500
Confectionery stores7001.0001.0001.000
Food outletsRestaurants121031.0000.9760.976
Fast food restaurants27041.0000.8710.871
Take away restaurants10081.0000.5560.556
Cafés340141.0000.7080.708
TP, true positive; FN, false negative; FP, false positive; PPV, positive predictive value. * Number of stores correctly located and classified (246).
Back to TopTop