Secondary Outcomes of a Front-of-Pack-Labelling Randomised Controlled Experiment in a Representative British Sample: Understanding, Ranking Speed and Perceptions

Front-of-pack labels (FOPLs) provide simplified nutritional information that aims to inform consumer choice and encourage reformulation. We conducted an online randomised controlled experiment on a representative British sample to test the effectiveness of FOPLs across a range of outcomes. The primary outcomes have been published; here, we present the secondary outcomes: the ability to rank the healthiest product and the time to complete the rankings by comparing the FOPL groups and a no-label control, as well as a descriptive analysis of the perceptions. Participants from the NatCen panel were randomised to one of five experimental groups (Multiple Traffic Lights; Nutri-Score; Warning Label; Positive Choice tick; no-label control). Six food/drink categories were selected (pizza, drinks, cakes, crisps, yoghurts, breakfast cereals), and three products were created with varying healthiness. The participants (analytic sample = 4530) were asked to rank the products in order of healthiness twice (baseline: no label; follow-up: experimental group label). Compared to the control, the probability of correctly ranking the healthiest product at follow-up was significantly greater for the N-S, MTL and WL across all products. The time to correctly complete the ranking was fastest for the N-S, PC and no-label control. The descriptive analysis showed that the FOPLs were perceived favourably, and especially N-S and MTL. The findings were supportive of the primary analyses, with those results suggesting that N-S performed the best, and then MTL.


Introduction
Front-of-pack labels (FOPLs) are simplified nutritional labels that are displayed on the front of food-and drink-product packaging. The aim of FOPLs is to provide simple and easy-to-understand information to consumers to aid healthier choices and to drive reformulation [1,2]. Consumer-friendly FOPLs that are easy to understand or interpretative have been recommended for use by the World Health Organisation (WHO) since 2014 [3]. There are many different FOPLs that are used globally, which vary by the level and type of information provided, the colour and the shape. Prominent labels include: were recruited to take part in an online experiment. The NatCen panel is nationally representative of Great Britain (GB) and is a probability-based sample. To take part in the experiment, the participants were required to be adults (aged 18 years or older), to be able to read and write in English and to complete the survey online. All active panel members were invited by email, post or SMS to participate, and they were offered a small financial incentive for completing the experiment. Data collection occurred between 28 October and 15 November 2020. Ethical approval was granted by NatCen's Research Ethics Committee (application reference: P15640).

Materials
Six food and drink categories were selected (pizza, instant hot chocolate, cake, crisps, yoghurt, breakfast cereal). Three mock products were created with varying nutritional compositions for each category: a "most healthy", an "in between" and a "least healthy". Mock images of the products were created by a graphic designer, and no nutrition or quality claims were included (gluten free, high in protein, etc.). Full details of the study protocol are available on Open Science Framework (https://osf.io/k9v2p/, accessed 1 January 2022).

Randomisation
Randomisation was stratified on the basis of key variables previously collected by NatCen: the year of recruitment to the panel; sex; age; government-office region; and household income. SPSS (Version 26.0, IBM Corp, Armonk, NY, USA) was used by Nat-Cen to randomly allocate equal numbers of participants to each of the five experimental groups (MTL, N-S, WL, PC or control). The allocation was concealed from participants and researchers.

Ranking Task
Participants viewed images of three products and were asked to rank them in order of healthiness, from most to least healthy. Participants completed the ranking tasks once with no FOPLs (baseline ranking tasks), and then again with their assigned FOPL shown on the products (follow-up ranking tasks). A total of 12 ranking tasks were completed by each participant (six product categories by two ranking conditions: baseline and follow-up). Participants were asked if they had enough information to rank the products after each task at both baseline and follow-up (12 occasions), and how confident they were after each set (baseline and follow-up, two occasions). Participants were able to zoom in on the image, but no other information was presented (including no back-of-pack information). To minimize missing data, participants could only select "don't know" after attempting to move to the next page without completing the ranking task, and software prevented ranking two products the same. The presentation of categories and products within categories was randomised. The time taken to complete rankings was taken from the survey software, which recorded the length of time (seconds) spent on each web page (each ranking task was on a separate page). Participants were advised to click next as soon as they had completed each task (see Supplementary Table S1 for full questionnaire). The healthiest-product outcomes and the speed-of-ranking outcomes were based on the ranking tasks. None of the FOPLs were introduced or explained to participants.

Label Perceptions
After the follow-up ranking tasks, to check that labels had been seen, participants were shown an image of their label condition and were asked if they had seen the FOPL. They were then asked if they had used the FOPL to complete the rankings, how easy/difficult they found the labels to understand, if they wanted the labels on packaging in the United Kingdom, how helpful in choosing what to buy they would be and views on how long it takes to use the labels. These questions were not applicable to the control group, as they did not see any FOPLs.

Measures
The primary outcome of the study was the ability to correctly rank products according to healthiness, the results of which have been published [18]. The secondary outcomes of the experiment presented in this paper are detailed below.

Secondary Outcomes
Healthiest-product ranking outcomes were examined in three ways:
Change in healthiest-product global food score, aggregated change in rankings for the five food products (score range was from -5 to +5).
Speed of ranking:

4.
Time to complete each ranking (seconds) at baseline and follow-up.

Descriptive Outcomes
Label perceptions, specific to the participants' label conditions (control group was excluded):

5.
Saw the label in the ranking tasks-dichotomised as yes vs no/not sure; 6.
Used the label in the ranking tasks-all/some/did not use; 7.
View on implementation of label in United Kingdom-support for mandatory labelling (yes-all); support for voluntary labelling (yes-some); no support for labels (no-none); 9.
View on helpfulness of label for food shopping-dichotomised as "helpful" (very helpful/quite helpful) vs "not helpful" (not very helpful/not at all helpful); 10. View on time to use label when shopping-quick enough/too long.

Participant Characteristics/Covariates
Food shopping and eating habits were assessed by asking food-shopping responsibility, label use when shopping, influence of nutritional information on shopping, healthy-eating knowledge, healthy-eating interest, if currently trying to lose weight (see Supplementary Table S1 for questionnaire). Demographics and health-related questions included height and weight, pregnancy status, physical or mental health conditions that affect vision/learning, understanding or concentration/diet for 12 months and English as a second language. As panel members of NatCen, certain demographic information was already known (sex, age), and other key demographic information was only asked if it had been more than 6 months since last updated (income, educational level). Additional information on household composition (children under 16 years) was also collected.

Statistical Analysis
The experiment was powered on primary outcomes, and so testing on secondary outcomes was limited. We limited formal statistical analyses to healthiest-product outcomes and speed-of-ranking outcomes, with a focus on the N-S-and-MTL comparison. For the healthiest-food-outcome analyses, we used the same three tests as for the primary analyses [18]: multilevel log-binomial regression analysis to compare the proportion of correct rankings between baseline and follow-up; log-binomial regression to compare the change in rankings from baseline to follow-up; and multiple linear regression analysis to compare the change in the global food score from baseline to follow-up. For the speedof-ranking analyses, we log-transformed the data to normalize the distribution, and then used regression analysis to test the association between the time taken to rank products at follow-up, before back transforming for interpretation. For these four outcomes, we compared each FOPL group with the control, and additionally compared MTL and N-S.
All models were adjusted for the five stratification factors used for randomisation (year of recruitment to panel, sex, age, government-office region, and household income) and the following prespecified covariates: ethnicity, highest education level, household composition, food-shopping responsibility and current FOPL use. For product-level analyses, we excluded participants if they self-reported not buying or consuming that product in the last 12 months to ensure that a lack of familiarity with a product did not impact results (see Supplementary Table S3 for sensitivity analysis with no requirement to consume products). For global-food-score analysis, we only included participants who reported buying or consuming all five food products, and so a lack of familiarity with a product did not impact the results. For speed of ranking, we required participants to correctly rank the product under the label condition (see Supplementary Table S4 for sensitivity analysis with no requirement to be correct at follow-up). Additionally, for the speed-of-ranking outcome, we adjusted for device used, as mobile users were found to be faster (coded as mobile, desktop or both-as participants could leave and rejoin the survey; see Supplementary Table S2 for the proportion of participants and devices used).
Models were weighted to account for non-responses and to ensure findings were representative of the British adult population. Stata software (Release 16, StataCorp LLC., College Station, TX, USA) was used for all analyses, and a significance level of 5% was used [23]. Output for healthiest-product outcomes was presented as relative risk (RR) of linear regression coefficient, with 95% confidence intervals, and output for speed of ranking was presented as relative mean (RM) (i.e., the ratio of the mean outcome in one group relative to the mean outcome in another group). The associations between label perceptions and FOPL groups are presented descriptively. Descriptive findings for the global change score (see primary outcome paper for details [18]) are presented by equivalized income per month (more than GBP 2000; GBP 1251-2000; GBP 801-1250; GBP 800 or less).

Participant Characteristics
NatCen invited 7218 panel members to participate, of whom 4863 accepted the invitation and were randomized to five experimental conditions: MTL (n = 968), N-S (n = 985), WL (n = 967), PC (n = 966) and control (n = 977). Complete data were available for 4530 participants and were included in these analyses. The participant characteristics for the food and shopping variables are presented in Table 1 (see Supplementary Table S4 for participant by characteristics by experimental group, and primary analyses for full baseline characteristics). The analysis sample is broadly representative of the British population and of the full NatCen sample [24]. There was a high proportion of participants who reported being very or quite interested in healthy eating (94%) and having some or a lot of knowledge about healthy eating (86%). Nearly half of the sample reported that they were currently trying to lose weight (47%).  Figure 1 shows the percentage of participants who correctly ranked the healthiest product at baseline and follow-up, and the percent change by experiment group and product category (see Table S5 for the number and proportion of participants correctly ranking at baseline and follow-up). The associations between the experimental group and the correct ranking of the healthiest product at follow-up, adjusted for covariates, are shown below in Table 2. The probability of participants correctly ranking the healthiest product at follow-up was significantly greater across all product categories in MTL, N-S and WL, compared to the control. N-S had the highest RR of correctly ranking the healthiest product, followed by MTL and WL. The PC group had mixed results, with significant differences for drink, yoghurt and cereal, but not for pizza, crisps or cake (no products in the cake and crisps categories were eligible for a PC). There were no significant differences between MTL and N-S in the direct comparison. We found that the proportion of participants correctly ranking the healthiest product at baseline was similar across the FOPL conditions.   The associations between FOPLs and the improved-change score, adjusted for covariates, are shown in Table 3. The probability of participants improving their score from baseline to follow-up was significantly greater across all product categories in MTL, N-S and WL, compared to the control. N-S led to the greatest probability of improved scores, followed by MTL and WL. PC had mixed results, with a greater probability of improved scores for drinks, yoghurt and cereal, but not pizza, cake or crisps. The comparison between N-S and MTL showed no significant difference. Table 3. Log-binomial regression results-relative risk that ranking of healthiest product improved (follow-up vs baseline) between FOPL group and control (adjusted for design factors and covariates). The associations between an improved healthiest-product global food score and the FOPL groups, adjusted for covariates, are shown in Table 4. We found that all of the FOPL groups were associated with a significant increase in the global food score for the healthiest product, compared to the control. A comparison between N-S and MTL showed that N-S led to a greater improvement in the healthiest-product global food score.

Speed of Ranking
The median and interquartile ranges of the time to complete the baseline and followup rankings in seconds, by experimental group and product category, are shown in Appendix Table A1. The baseline median ranking times were similar within product categories and between groups. The follow-up rankings were faster compared to the baseline rankings, for all categories and experimental groups. Overall, there was variation between the product categories at baseline (cake was the quickest, yoghurt and cereal were the longest), but there was little variation between the product categories at follow-up. In a comparison between N-S and MTL, N-S was fastest across all categories. The median ranking times and the interquartile ranges, in seconds, of the summed follow-up ranking times (seconds) by experimental condition are shown in Figure 2. The associations between the time to complete the rankings of each product and the FOPL conditions, adjusted for covariates, are shown in Table 5, with a higher RM indicating that the time to complete the ranking was slower (i.e., worse) (see Supplementary Table S6 for sensitivity analysis without the requirement for being correct at follow-up). By FOPL group, for pizza, we found that the time to complete the ranking significantly increased by 25% for MTL, and by 28% for WL, compared to the no-label control; there were no significant differences for N-S or PC compared to the no-label control. The pattern was broadly consistent across the product categories. A comparison between N-S and MTL showed that the time to complete the ranking decreased by 15-22% for the participants in the N-S group compared to the MTL group, across all product categories (all p < 0.001). Table 5. Multiple regression analysis results-association between time taken to rank products at follow-up and FOPL group compared to control (adjusted for baseline ranking time, device used, design factors and covariates). All analyses were adjusted for the five stratification factors (year of recruitment to panel, sex, age, governmentoffice region, household income) and the following prespecified covariates: ethnicity, highest education level, household composition, food-shopping responsibility, current FOPL use, baseline ranking time and device used. Participants needed to have complete covariate information and buy/eat cereal/pizza to be included, and they needed to correctly rank the products. To deal with outliers, the data were log-transformed for analyses and then back transformed for interpretation. MTL: Multiple Traffic Lights; N-S: Nutri-Score; WL: Warning Label; PC: Positive Choice tick; RM: relative mean; CI: confidence interval.

Descriptive Analyses of Perceptions
Descriptive analyses of the label perceptions by FOPL group are shown in Table 6. We did not formally test the differences between the labels. These questions were specific to the label conditions that the participants were randomised to and were therefore not applicable to the control group, as they saw no labels. Overall, the perceptions of the labels differed greatly between the FOPL groups. The majority of participants favourably viewed labels and supported mandatory labelling in the United Kingdom. The N-S and MTL groups had the highest proportions of participants perceive labels favourably. N-S had the highest proportion of participants who reported: seeing the label; using the label for all ranking tasks; that the label was easy to understand; and that the label was quick enough to use. Most commonly, MTL followed. MTL had the highest proportion of participants who reported perceiving the label to be helpful in making purchasing decisions and who supported having labels on all food and drink products in the United Kingdom.

Descriptive Analysis of Ranking Outcome by Income
Descriptive analyses of the global food score by experimental group and participant income are shown in Table 7, with the higher global food score indicating greater improvement in ranking (see Supplementary Table S7 for sensitivity analysis without requirement to consume all food products). Between N-S and MTL, N-S appeared to be more stable at a high level and showed fewer differences in the global food score across income groups, compared to MTL. The experiment was not powered to formally test these differences. Table 7. Mean global food scores and standard deviations by experimental group and equivalised household income per month. Global food score is an aggregated score of correct ranking in the five food products, with a range from −5 to +5 (− indicates worsening and + indicates improvement). Participants required to buy/eat each food product in the last 12 months and to have full covariate information to be included. MTL: Multiple Traffic Lights; N-S: Nutri-Score; WL: Warning Label; PC: Positive Choice tick; SD: standard deviation.

Discussion
In this online randomized controlled experiment on a representative British sample, we found that N-S and MTL performed the best across a range of outcomes, including the correct ranking of the healthiest product and the speed of the ranking. The healthiest-product analysis found that N-S, MTL and WL performed the best, and PC performed the worst in the ranking tasks. In terms of timing, the analysis showed that participants who used N-S ranked significantly faster than those who used MTL, and the descriptive analysis of the label perceptions showed that FOPLs were perceived favourably overall, and especially N-S and MTL. This is consistent with previously published data on the primary outcomes (i.e., that N-S, followed by MTL, led to significant improvements in the ability of participants to correctly rank products according to healthiness).
Our findings are broadly consistent with previous research. The results from experimental ranking studies have shown that N-S performs the best for ranking tasks [10,11], and a meta-analysis found that summary indicator labels are the most effective at helping consumers identify the healthiest products [16]. In the current study, we conducted the healthiest-product analysis to supplement the primary analysis of ranking all three correctly in order to allow for a fairer comparison for PC (except cake and crisps, which did not qualify). Since PC is a binary label, it cannot differentiate between three products. Despite this, PC was still found to be the worst-performing label. For the time needed to use labels, no studies have compared N-S and MTL, to the best of our knowledge, but experimental studies conducted in Uruguay and Mexico found that MTL and WL were faster compared to the Guideline Daily Amounts [25,26]. We found that, in a large representative British sample, N-S was significantly faster to use than MTL for the follow-up ranking task, and, crucially, it also led to the highest proportion of correct rankings in the primary analyses, compared to the other experimental groups. This is interesting as the MTL label is currently endorsed and used on a voluntary basis in the United Kingdom, and it is therefore likely to be familiar to most participants. N-S provides an overall summary of the product healthiness, compared to MTL, which contains more information, including nutrient-specific information. Experimental studies that tested the perceptions of FOPLs (including MTL and N-S) following use in the ranking tasks found that the respondents viewed the labels favourably, in general, and that they strongly supported mandatory labelling across all label conditions [27][28][29]. They found that the participants who viewed MTL responded most favourably across 12 countries [27], but later replications in Belgium and the Netherlands found no difference [28,29]. A large cross-sectional study in France found N-S was most favourably perceived by the participants, who also rated MTL, a reference intake label and a simplified nutrition-labelling system, including the quickest to process [30].
Time has been identified as a factor that impacts upon actual label use, and so our finding that N-S is the fastest to use while also leading to the highest proportion of correctly ranked products suggests that N-S may be a more effective label than the other labels that we tested [8,9,31,32]. The ranking tasks in this study are not representative of how long the tested labels would take to use in real decision-making scenarios with additional factors, such as costs, preferences, and product claims, but we believe that these factors are likely to be broadly consistent, regardless of the FOPL used. Indeed, research indicates that people spend less time making purchasing decisions than the median time shown to complete the ranking tasks, with observational studies in the United Kingdom and Uruguay showing that consumers spend, on average, 22-28 s choosing a product, and only 7 s from picking up a product to putting in a basket [6,33].
As was previously stated, none of the labels were explained to the participants, and we assumed that prior exposure to labels other than MTL (in voluntary use in England) was low; therefore, we expected a low subjective understanding of these labels. Both N-S and PC are abstract labels that may need some explanation, which is contrary to MTL and WL, which are self-explanatory. Because of this, the positive perceptions of N-S and MTL indicate that both summary indicator and nutrient-specific labels may be liked by British consumers. MTL is the FOPL that is currently used in the United Kingdom on a voluntary basis, and so the understanding and use of this label is assumed to be higher compared to the other labels. Understanding and familiarity combined may have negatively impacted on the confidence in using the other labels, compared to MTL [7,34].
The secondary outcomes of timing and perception that are presented here relate to the actual use of FOPLs, as opposed to the ability of FOPLs to improve the understanding of product healthiness. For FOPLs to be effective, they not only need to easily communicate nutritional information about the product, but they also need to be useable, and especially for people with lower education. This is vital, as a key aim for policymakers is to reduce, and not widen, inequalities. The descriptive results indicate the N-S may work better across all income groups, as the global-food-score results were more stable, compared to MTL and WL. We found a similar result when examining the proportion of correct/incorrect by education level [18].
Despite being a high-quality experimental study, there were limitations of the secondary analyses. The study was powered on the primary aims, and so we limited the statistical testing of the secondary aims. The healthiest-product analysis aimed to deal with the limitations of the PC label, as it does not provide sufficient information to rank products because it is a binary label. A further limitation of the PC label is that it did not qualify for use in two of the food categories (crisps, cakes), which is reflected in the responses of the participants in the PC group who reported seeing the label (lower compared to the other FOPL groups). This is reflective of the real-world application and limitations of the PC label, as many products would not qualify and, therefore, would not have a label or provide any information to consumers. Data from the software was used for the timing outcomes, and the participants were given the following instructions: "we would like to know how long it takes for you to rank the different foods, so please click 'next' as soon as you have completed each task." These directions were provided not only to make the participants aware that we would collect this data, but also so as not to rush them, as timing was not the primary outcome and, therefore, care needs to be taken with the interpretation of the findings. The participants only answered perception questions about the FOPL that they used, and, therefore, we cannot compare preferences between labels. This was not only partly to reduce the time burden on the participants, but it also meant that the responses to these questions were based on the actual use of the label. The sample was representative of adults in Great Britain and was broadly comparable to the full NatCen panel and to the British population, including the percentage who reported wanting to lose weight [24,35,36].

Conclusions
The results from this randomised controlled experiment show that FOPLs can be used quickly and effectively to improve the understanding of product healthiness, and that they are perceived favourably in a large representative British sample. The findings were consistent with the primary report, and they show that N-S performed the best across the outcomes, and that it was closely followed by MTL; all FOPLs were better than the no-label control. The N-S label was also the quickest to use correctly. The FOPLs were perceived favourably overall, and especially MTL and N-S, with strong support for mandatory labelling on all products in the United Kingdom. The findings of this experiment extend the evidence and provide support for FOPLs in the United Kingdom, which can inform future policy decisions.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/nu14112188/s1, Table S1: Full questionnaire; Table S2: Device usage by age category; Table S3: Summary of baseline and follow-up ranking times (seconds), by product category, overall and FOPL group (no requirement to consume product to be included in analysis); Table S4: Individual characteristics of the analysis sample, by experimental group; Table S5: Summary of participants who correctly ranked the healthiest product at baseline and followup, by FOPL group and product category; Table S6: Multiple regression analysis results-association between time taken to rank products at follow-up and FOPL group compared to control-without requirement for being correct (adjusted for baseline ranking time, device used, design factors and covariates); Table S7: Mean global food score and standard deviations by experimental group and equivalised household income per month-without requirement of buying/eating all food products.