In 2016, non-communicable diseases (e.g., cardiovascular disease, cancer, obesity, and type 2 diabetes) were responsible for 39.5 million deaths worldwide [1
]. For these diseases, nutrition-related behaviours are recognised as some of the main risk factors and are considered key elements in public health policies, as they represent modifiable determinants of health that can be addressed through primary prevention interventions [2
]. Therefore, various strategies and public policies have been introduced worldwide to improve people’s diets [7
]. Among them, the provision of nutrition information via front-of-pack labels (FoPLs) has been attracting growing attention from public health authorities. As FoPLs provide information on the nutritional content (or quality) of pre-packaged food products, they can help consumers to make heathier food choices at the point of purchase [4
]. Moreover, FoPLs are postulated to encourage food manufacturers to reformulate to increase the healthfulness of their products to improve the FoPLs shown on the foods [13
]. Due to these individual and market-level considerations, simulation studies suggest that the adoption of FoP nutrition labelling constitutes a cost-effective means of achieving health benefits [15
For a FoPL to be useful in purchasing situations, consumers need first to understand the information they provide [17
]. Understanding can be distinguished as either subjective or objective understanding. Subjective understanding refers to the meaning attached by consumers to the label information and the extent to which they believe they have understood this information, while objective understanding is defined as the consumer’s capacity to interpret the information conveyed by the FoPL as intended by its designers [17
]. As such, a subjective understanding is usually captured by a self-administered questionnaire including a self-report by participants on the extent to which they believe they understand the information conveyed by a FoPL. Objective understanding, on the other hand, is captured by requiring participants to complete a task in which understanding is tested, such as ranking or selection tasks with visuals of food products displaying FoPLs. Objective understanding is influenced by a number of factors, both at the individual level (e.g., interest in and/or knowledge about nutrition, sociodemographic characteristics) and at the FoPL level (e.g., graphical design) [17
]. Over the last decade, a number of different types of label designs has been developed, including nutrient-specific labels that display information on the content of a given nutrient and summary labels that provide an assessment of the overall nutritional quality of a given food product. Nutrient-specific labels can be divided into three categories: (i) numeric-only, such as the Reference Intakes (RIs) developed in 2006 and applied internationally by the food industry [18
]; (ii) colour-coded labels, such as the Multiple Traffic Lights (MTL) label that was first implemented in the United Kingdom (UK) in 2005 (with each colour associated with the nutrient amount: red for a high amount, amber for a moderate amount, and green for a low amount) [19
]; and (iii) warning labels, such as the Warning symbol (first implemented in 2016 in Chile [20
]) that advises when the level of a given nutrient exceeds what is considered a healthy amount. Summary FoPLs can be categorised as (i) scale-based graded labels indicating the overall nutritional quality of the product, such as the Nutri-Score adopted in France in 2017 [21
] and the Health Star Rating (HSR) system that first appeared on food packages in Australia in 2014 [22
]; and (ii) endorsement symbols applied only to healthier products in a given food category and based on pre-set limits regarding the level of certain nutrients. Examples include the Choices label introduced in the 2000s in the Netherlands [23
] and the Green Keyhole symbol introduced in the 1980s in Sweden and later in Denmark [24
]. Except for nutrient-specific numeric FoPLs, which are purely informative, all other labels entail some level of interpretation of nutritional content through the use of colours, graphics, and/or textual elements and can be considered as interpretive labels.
Literature reviews have concluded that FoPLs are generally favourably perceived and can increase consumers’ awareness of the healthiness of various food products [25
]. Moreover, interpretive labels tend to be better understood by consumers than purely informative labels [28
]. In recent years, there has been a steep increase in the number of studies comparing the effectiveness of various FoPLs [29
]; however, the number of FoPLs compared in each study is typically small and more recent models (such as warning labels and summary graded FoPLs) are understudied. A growing number of countries are considering introducing FoPLs as a national public health tool, and some studies have revealed differences in consumer understanding and the effectiveness of FoPL formats across countries [40
]. However, studies comparing different FoPLs across diverse cultural contexts are scarce.
To address this research gap, an international comparative study with an experimental design was conducted by two research teams to assess the effectiveness of various FoPLs across 12 countries. The FOP-ICE (Front-Of-Pack International Comparative Experimental) study investigated various aspects of consumer’s reactions to FoPLs, including attitudes, understanding, and impact on food choice. The present analysis focuses on consumers’ objective understanding of five FoPLs currently in use around the world (including nutrient-specific and summary labels: HSR, MTL, Nutri-Score, RIs, and Warning symbol) using a randomised experimental design.
From April to July 2018, 12,015 participants were recruited in Argentina, Australia, Bulgaria, Canada, Denmark, France, Germany, Mexico, Singapore, Spain, the UK, and the United States of America (USA). In each country, recruitment was carried out through the same ISO-accredited international web panel provider (PureProfile) using quota sampling accounting for age (one-third of recruited participants in each of the following age categories: 18–30 years, 31–50 years, over 51 years), sex (50% women), and socioeconomic status (one-third of recruited participants in each of the following household income levels: low, medium, and high), to ensure equal coverage of the major population groups. Income brackets were calculated by estimating the median household income within each country and then creating a bracket of ±33% around this median, corresponding to the medium income band. Incomes below or above were considered as low- or high-income bands, respectively. To increase the ecological validity of the study, individuals who reported never or rarely purchasing at least two of the three food product categories tested in the study (pizzas, cakes, and breakfast cereals) were deemed ineligible to participate, because they would be unlikely to make these purchase decisions in real life.
The protocol of the present study was approved by the Institutional Review Board of the French Institute for Health and Medical Research (IRB Inserm n°17-404) and the Curtin University Human Research Ethics Committee (approval reference: HRE2017-0760).
2.2. Design and Stimuli
Three food categories were selected for stimuli development according to two main criteria: (i) high variability in nutritional quality within the category and (ii) consumed in all 12 countries included in the study. Mock packages representing a fictional brand (“Stofer”) were used as stimuli to prevent other factors from interfering with product evaluation (e.g., familiarity, loyalty, and habit). The mock packages were created to resemble real food products, and a zoom function was developed to allow participants to enlarge any area of the package, including the FoPL. Within each food category, a set of three products with distinct nutritional profiles (lower, intermediate, and higher nutritional quality) was created to allow ranking, and the same food products were used across the different FoPL conditions. No other nutritional information or quality indicators (e.g., organic certification) appeared on the mock packages, so as not to influence participants’ perceptions of the products. All FoPL variants appeared in the same place on a given food product, and covered roughly the same surface area on the package. An example of the set of pizzas used in the study with the five corresponding FoPLs tested is shown in Figure 1
; the two other sets of cakes and breakfast cereals are shown in Figures S1 and S2
Participants were invited to complete an online survey hosted by an international web panel provider. For each country, the online survey was translated into English for Australia, Canada, Singapore, the UK, and the USA; Spanish for Argentina, Mexico, and Spain; German for Germany; French for France; and Bulgarian for Bulgaria. Eligible participants were asked to provide information on their sex, age, income, household composition, educational level, involvement in grocery shopping, and self-estimated level of nutrition knowledge and diet quality. Following the socio-demographic, lifestyle, and nutrition-related questions, participants were presented with the initial task that asked them to rank the three sets of three label-free products (one set of three pizzas, one set of three cakes, and one set of three breakfast cereals) according to their nutritional quality. For each product, participants could choose from the following options: “1—Highest nutritional quality”, “2—Medium nutritional quality”, and “3—Lowest nutritional quality” (an “I don’t know” option was also included). Participants were subsequently randomised to one of the five FoPL conditions (HSR, MTL, Nutri-Score, RIs, and Warning symbol) and asked to repeat the same ranking task, this time with one of the five FoPLs displayed on the mock packages, according to the randomisation arm. Participants were not aware that they would be seeing the products twice, or that a FoPL would be present on the second viewing. Any potential presentation order effects were controlled for by randomising the order in which the products and the categories appeared on the screen. Participants’ objective understanding of a FoPL was assessed by comparing their ranking task results between the no label and FoPL conditions. It estimated the ability of individuals to use information conveyed by the FoPL to correctly rank products according to their nutritional quality compared to the no label condition. At the end of the survey, participants were asked whether they recalled seeing the FoPL to which they were exposed. The study protocol has been described in detail elsewhere: http://www.ANZCTR.org.au/ACTRN12618001221246.aspx
2.4. Statistical Analysis
Sociodemographic and lifestyle characteristics and FoPL recall were summarised by country and for the full sample. If a participant reported never purchasing products from a particular food category, his/her response to the corresponding ranking task was excluded. Next, for each participant and food category, the number of correct responses was calculated for the no label and the FoPL tasks. Ranking was considered correct if all the three products were ranked in the expected order and incorrect if any of the products were ranked out of order. The change in the number of correct responses across the three food categories from the no label to the FoPL condition was computed for each participant and expressed as a percentage.
The main outcome variable was the change in the number of correct responses between the FoPL and no label conditions. This was computed for each food category, leading to a category score of between −1 (deterioration) and +1 (improvement), with 0 denoting no change. Participants’ scores were then summed across the three categories, resulting in a final global score ranging from −3 to +3. Given the limited number of response options for the outcome variable, multivariable ordinal logistic regression was used to evaluate the association of FoPLs with change in the ability to correctly rank products from the no label to the FoPL conditions. Given the previous lower performance of the RIs reported in the literature, this FoPL was used as the reference category in the ordinal logistic regression models. Individual characteristics taken into account as covariates included sex, age, educational level, household income, involvement in grocery shopping, and self-estimated nutritional knowledge and diet quality. Variables displaying statistical significance at the p-value < 0.25 level in bivariate models were included in the multivariable model. For analyses including the full sample, the country was also included as a covariate. Sensitivity analyses were performed following exclusion of participants who did not recall seeing the FoPL during the survey. A false discovery rate approach was used to take into account multiple comparisons. A p-value below 0.05 was considered statistically significant. Statistical analyses were carried out using the full sample and by country, for all food categories combined and by individual food category, using SAS Software (version 9.3, SAS Institute Inc., Cary, NC, USA).
Between April and July 2018, 12,015 participants responded to the online survey and were included in analyses (Table 1
). The average time spent by the participants on the online questionnaire was 10.7 min, resulting in 0.45 min per item. Overall, 33.8% of participants had an undergraduate degree, 74.5% were responsible for grocery shopping, 64.9% reported having a mostly healthy diet, and 60.8% reported being somewhat knowledgeable about nutrition. Across the whole sample, 62.2% of participants recalled seeing the FoPL to which they were randomised. The two FoPLs with the lowest rate of recall were the Warning symbol (48.4%) and HSR (56.5%).
The number of correct responses by food category by FoPL is presented in Figure 2
. All five FoPLs improved the number of correct responses in the ranking task compared with the no label situation. However, large disparities were observed among the labels. For all countries combined, the Nutri-Score elicited the largest increase in the number of correct responses compared with the no label situation (+47% for pizzas, +229% for cakes, and +95% for breakfast cereals). This was followed by the MTL (+30% for pizzas, +143% for cakes, and +50% for breakfast cereals), the HSR (+19% for pizzas, +87% for cakes, and +46% for breakfast cereals), and the Warning symbol (+13% for pizzas, +92% for cakes, and +40% for breakfast cereals). Finally, the RIs elicited the smallest increase in the number of correct responses (+12% for pizzas, +17% for cakes, and +27% for breakfast cereals). Overall, similar patterns were observed in each country (data not shown).
Associations between FoPLs and change in participants’ ability to correctly rank products according to their nutritional quality are displayed in Table 2
. In the full sample, all FoPLs significantly outperformed the RIs. However, as before, the magnitude of the effect differed according to FoPL. The Nutri-Score was associated with the highest improvement in ability to correctly rank product healthiness (Odds Ratio [95% confidence interval]: OR = 3.07 [2.75–3.43], p
-value < 0.0001), followed by the MTL (OR = 1.77 [1.59–1.98], p
-value < 0.0001), the HSR (OR = 1.37 [1.23–1.53], p
-value < 0.0001), and the Warning symbol (OR = 1.28 [1.15–1.43], p
-value < 0.0001). Furthermore, the Nutri-Score performed the best in all 12 countries, with ORs ranging from 2.14 [1.48–3.10] (p
-value = 0.001) in Argentina to 4.45 [3.02–6.56] (p
-value < 0.0001) in Singapore. The results for the remaining FoPLs were heterogeneous across countries; however, in most instances the MTL was the second-best performing label after the Nutri-Score. The HSR and the Warning symbol also significantly outperformed the RIs in most countries, but the effects were weaker. Similar trends were found when analyses were performed separately for each food category, with FoPLs appearing somewhat more effective in the cake products category compared with the other two categories (Table S1
In sensitivity analyses including only participants who recalled seeing the FoPL during the survey, higher magnitudes of effects were observed, and the order of FoPLs according to improvement in participants’ ability to correctly rank the nutritional quality of food products was slightly modified (Table S2
). In the full sample for all food categories, the Nutri-Score performed best compared to the RIs (OR = 3.64 [3.20–4.14], p
-value < 0.0001), followed by the Warning symbol (OR = 2.00 [1.74–2.31], p
-value < 0.0001), the MTL (OR = 1.87 [1.65–2.12], p
-value < 0.0001), and the HSR (OR = 1.76 [1.54–2.02], p
-value < 0.0001). Similar trends were observed across countries.
In the present study, all five FoPLs significantly improved the ability of individuals to rank products according to their nutritional quality, but with notable differences across FoPL types. Compared to the RIs, which emerged as the least effective FoPL, the Nutri-Score produced the highest improvement in ranking ability, followed by the MTL, HSR, and Warning symbol. Similar trends were observed across all three food categories and all 12 countries. However, the insignificant results in individual countries may be partly explained by multiple testing corrections and lack of sufficient statistical power for some of the models.
The fact that all FoPLs were associated with a significant improvement in food healthfulness ranking ability compared to a no label situation is consistent with the literature, suggesting that FoPLs can help consumers discriminate between the nutritional quality of different food products and identify healthier food choices [25
]. In addition, the interpretive FoPLs (Nutri-Score, MTL, HSR, and Warning symbol) significantly outperformed the non-interpretive label (RIs), which is in line with the results of prior studies [31
]. The comparatively weak performance of the RIs may be explained in particular by its reliance on numeric information (grams and percentages), and its evaluation per portion [30
]. Nutrient-specific labels providing only numerical information have been consistently found to be poorly understood by consumers, in particular by those with low educational levels, as they entail a high cognitive workload to interpret [26
]. However, even though interpretive labels clearly outperform non-interpretive ones, design features are also likely to result in varying degrees of FoPL effectiveness. Hence, it appears important to better understand the characteristics of interpretive FoPLs that improve consumers’ understanding of the nutritional value of foods.
Given the findings of the present study, two major features appear to influence FoPL understanding: use of colours and summary versus nutrient-specific information. Interpretive FoPLs associated with the highest increase in objective understanding were the Nutri-Score and the MTL, which were the only two colour-coded labels among the five FoPLs tested. It has been demonstrated that the use of colours is key regarding FoPL salience, as colours tend to capture attention [27
]. Moreover, the use of the well-known polychromatic green-red scale might be an important feature of FoPL colour coding. Indeed, green and red colours, corresponding to recognised signals, may be easier to understand and interpret, with green being associated with safety and a “go” signal, and red being associated with danger and a “stop” signal [33
]. Thus, the presence of a colour-coded FoPL may be effective at different stages of information processing: at an early stage by drawing attention to the label and at a later stage by aiding understanding [50
]. In contrast, the HSR and the Warning symbol, which are monochromatic labels, were the two interpretive FoPLs with the lowest percentage of participants recalling seeing the label during the survey and the weakest performance regarding objective understanding. In sensitivity analyses, when considering only participants recalling seeing the FoPL, the results for the Warning symbol were substantially improved. This suggests that this type of nutrition label is well understood once identified and might even result in improved effectiveness if presented in more salient colours [49
The other key element of an FoPL’s format that may influence its ability to increase understanding of nutrition quality is the presence of a summary indicator rather than merely nutrient-specific information. Indeed, among the colour-coded FoPLs tested in the study, the Nutri-Score summary label performed notably better than the nutrient-specific MTL. This finding is consistent with prior studies’ findings that summary indicators are more easily understood by consumers [27
] and limit potential confusion related to the interpretation of nutritional terms (e.g., saturated fats, sugars, and sodium) [52
]. These FoPLs provide synthesised information that may be associated with a reduced cognitive workload, resulting in faster processing and less difficulty in understanding the meaning of the information provided [30
]. While the MTL provides five different pieces of information on specific nutrients, the Nutri-Score summarises the overall nutritional quality of the product. Generally, these types of nutrition labels appear to be more efficient and useful tools with which to influence consumers’ choices at the point-of-purchase where decisions are made in a very short time period [40
]. Hence, the stronger performance of the Nutri-Score regarding objective understanding may be related to its use of the combination of both semantic colours and a simple and intuitive summary graded design.
In the present study, similar patterns of consumers’ objective understanding of the FoPLs were observed across the 12 countries, with comparable magnitudes of effects, even if the geographical area and food cultural background are quite different. More specifically, the Nutri-Score showed greater effectiveness compared with the other four FoPLs, even in countries where an alternative official FoPL is already implemented. That was notably the case in the UK, where the MTL was introduced on pre-packed foods in 2004, and Australia, where the HSR system has been applied on food packages since 2014. In these two countries, the Nutri-Score performed better than the MTL and the HSR, respectively, suggesting that its graphical assets may outweigh any potential benefits of familiarity. This finding is consistent with the results of a study that compared evaluation, use, intentions, and product choices among three nutrition labels in two countries with different FoPL histories [53
]. The authors observed that familiarity with a FoPL influenced self-reported evaluations and use intentions only, but all FoPLs were equally effective in encouraging healthier food choices. This homogeneous result across countries may be partly explained by the fact that these key elements of interpretations and, more specifically, the use of colour-coding with the green-red polychromatic scale are internationally understood. Indeed, given the specific neurobiological aspects of color recognition in humans, green/red cues are considered to be the most easily differentiated colors [54
]. In the present study, very few disparities were found among countries, with only a small number of instances in which specific FoPLs were more strongly associated with objective understanding in some but not other countries. For example, the HSR effect was significant in Australia, Bulgaria, and Singapore only, and the Warning symbol was significant in Singapore only compared to the RIs. These limited instances of discrepancies in understanding and use of FoPLs among countries may be partly attributed to the local context and the impact and strength of the public discourse on nutrition and labelling [41
Strengths of this study included the large sample size and the recruitment of participants in 12 countries from different continents (Europe, North and South America, Asia, and Oceania) that facilitated cross-cultural comparisons of FoPL effects. In addition, the use of sets of three food products (rather than evaluation of sets of two as is often done in other studies) approximated realistic situations while decreasing the risk of correct responses simply by chance. Furthermore, the stimuli were developed to ensure a clear nutritional difference between the products were communicated by the information provided by each FOPL to facilitate the ranking process. However, these methodological choices led to the exclusion of endorsement schemes from the test as understanding of these FoPLs is difficult to assess across more than two products at once (e.g., no discrimination would be possible between two products without any endorsement labels on their packages). Finally, a potential learning effect was also controlled for through the randomisation of the order of presentation within the sets and across food categories.
However, some limitations of the study should be acknowledged. A primary limitation was the use of a web panel using set quotas across countries rather than attempting to generate population representative samples. Thus, caution is required regarding extrapolation of the results. However, participants in all 12 countries were recruited using the same methods and criteria. Second, results may have been influenced by the familiarity in the cases where one of the five FoPLs was already implemented in a particular country. However, this was taken into account by adjusting the country of origin in the analyses including the full sample. Third, participants did not have access to the nutritional composition of the products used in the study, which differs from real-life situations in which consumers would often be able to access more detailed nutrition information on the back of the pack. This might have led to fewer correct responses in the no label situation than in real life settings. However, it has been demonstrated that back-of-package information is rarely considered when grocery shopping [54
]. Finally, the study was conducted as an online experiment and not in a real shopping situation, in which many additional factors are likely influence consumers’ perceptions and choices. Indeed, time pressure and the familiarity of consumers with specific food products and brands may influence purchasing choices, while the timing of the questionnaire completion in the present study was not limited, and fictional foods were used.