Comparing the Rate-All-That-Apply and Rate-All-Statements Question Formats across Five Countries

Rate All That Apply (RATA) is a derivative of the popularly used Check-All-That-Apply (CATA) question format. For RATA, consumers select all terms or statements that apply from a given list and then continue to rate those selected based on how much they apply. With Rate All Statements (RATING), a widely used standard format for testing, consumers are asked to rate all terms or statements according to how much they apply. Little is known of how the RATA and RATING question formats compare in terms of aspects such as attribute discrimination and sample differentiation. An online survey using either a RATA or RATING question format was conducted in five countries (Brazil, China, India, Spain, and the USA). Each respondent was randomly assigned one of the two question formats (n = 200 per country per format). Motivations for eating items that belong to five food groups (starch-rich, protein-rich, dairy, fruits and vegetables, and desserts) were assessed. More “apply” responses were found for all eating motivation constructs within RATING data than RATA data. Additionally, the standard indices showed that RATING discriminated more among motivations than RATA. Further, the RATING question format showed better discrimination ability among samples for all motivation constructs than RATA within all five countries. Generally, mean scores for motivations were higher when RATA was used, suggesting that consumers who might choose low numbers in the RATING method decide not to check the term in RATA. More investigation into the validity of RATA and RATING data is needed before use of either question format over the other can be recommended.

For the Check-All-That-Apply (CATA) or Mark-All-That-Apply format, respondents select all attributes or statements that apply from a given list. Easy and non-tedious are the key reasons why CATA has gained popularity in recent years [2,[15][16][17][18]. However, there is criticism of this question format because of the equivocal interpretations of the unchecked attributes on the listed options [1,4,5,19]. Conversely, the Check-All-Statements (CAS) or the forced-choice yes/no questions require respondents to provide a response (e.g., yes/no or agree/disagree) for each attribute or statement to show that it applies or does not apply. Although CAS is immune to primacy bias (attributes at the top of the list are marked more frequently than those at the bottom of the list), which is prevalent with CATA questions, CAS has been associated with acquiescence bias, where respondents tend to mark or agree with the positive connotation for all survey questions [1,4,[20][21][22].

Rate All Statements (RATING)
The Rate-All-Statements (RAS) or simply the RATING question format uses intensity, or degree, scales to rate consumers' responses for each attribute or term in relation to the particular product(s) that are being investigated [7,9]. Cognitive psychologists and other consumer researchers have for over five decades considered the RATING question format as the gold standard for measuring the intensity or degree of importance or applicability of product attributes [8,[23][24][25][26][27]. Unipolar and bipolar scales (i.e., present different degrees of one attribute and another set of degrees of the opposite attribute) are usually used with the RATING question format [28][29][30][31][32]. Lengths of these intensity scales can vary depending on the objective of the consumer study and the desired level of scale sensitivity or discrimination [28]. For example, an intensity scale can have 5 points, i.e., not at all important, slightly important, moderately important, very important, and extremely important, but can also be shorter, with just 3 points (low, medium, and high) [3,12].
Stevens [26] recommended that interval-scale data such as intensity, degree, or Likert scales can be analyzed using parametric tests which comprise arithmetic means with t-tests or analysis of variance (ANOVA) to determine significance. The numerical value assigned to each node of the intensity scale allows for linear transformation of the data without loss of information [26]. However, treating these ordinal scales as interval scales has been met with controversy by various authors [33][34][35][36][37] who advocate for the use of non-parametric tests (e.g., chi-square tests and Spearman's Rho) to analyze the ordinal data. One advantage of the RATING question format is that its scales collect more detail and also provide for ranking of respondents' opinions, something that is not the case for other question formats such as CAS and CATA question formats, where respondents provide either a yes/no response and a simple check for those terms that apply, respectively. Further, scales used for RATING questions are usually anchored on one end, with terms such as "strongly disagree", "never", "none" and "not important at all" which give the respondent an "out" in case they do not find the particular term or attribute important or applicable to the product that is being examined. One notable disadvantage for the RATING question format is that accuracy of responses could be impacted based on the subject or topic being assessed. For instance, it is possible that respondents may provide incorrect responses to socially sensitive questions (e.g., child abuse behavior and sexual habits) [38]. Further, considering that the RATING question format requires a greater thought process than other question formats such as CATA and CAS, it is possible that the consumers' survey mean duration could be longer, which could impact the cost of consumer studies, since more time means more money. Similarly, survey incompletion rates (non-response error) for the RATING format could be higher than those of other question formats such as CATA and CAS [39]. Nevertheless, the RATING question format remains the unofficial gold standard for product description questioning in consumer studies [23].

Rate All That Apply (RATA)
For the Rate-All-That-Apply (RATA) question format, after checking all terms that "apply" from a list of options (that is a CATA question), respondents are asked to rate them using a scale that can be 3-9 points [12][13][14]40,41]. Put simply, RATA is a combination of the CATA and RATING question formats [14,41]. Ng et al. [41] noted that although the CATA question format has become highly popular in recent years because of its ease and non-tedious structure, its degree of discrimination among samples, particularly among samples of similar profiles, is limited. This inspired the development of a spin-off question format which saw the inclusion of an intensity or degree scale (e.g., 3-point or 5-point scale) onto the CATA question structure [14]. Data collected using RATA questions can be analyzed in two ways. The first method involves treating of RATA responses as CATA data and conducting analyses such as correspondence analysis. The second and recommended way of analyzing RATA data is by treating it as interval-scale data rather than ordinal data [3,12]. As such, unchecked attributes could be coded with a zero. Meyners et al. [12] explains that if a 3-point scale was used to rate the checked terms, it could be treated as a 4-point scale and analyzed using parametric tests such as t-test and analysis of variance (ANOVA). Ideally, RATA would be expected to benefit from the best features of CATA and RATING, i.e., fairly easier to complete with a lesser burden to respondents than RATING, and enhanced sample discrimination and a more detailed sample description capability than CATA. Additionally, the RATA question format could be associated with CATA limitations such as primacy bias and ambiguity in interpreting unchecked attributes when RATA is used in place of RATING. Vidal et al. [3] who compared CATA and RATA data collected from seven consumer studies found that RATA data were not superior than CATA data. In fact, those authors stated that the use of RATA instead of CATA could be influenced by the overall research objective and the particular sample or product characteristics [3,14]. At the time of writing, there is no research that compared RATA and RATING data in terms of aspects such as discrimination among samples and discrimination among attributes, nonresponse error (survey incompletion rates) which are important parameters that researchers use determining what question format to use in future consumer studies.
Rapid advancements in information technology in recent years-for example, the increased access to faster and affordable internet in East Asia, North America and Western Europe-have made online or web surveys a popular survey method for consumer researchers in multiple countries [42][43][44][45]. Online surveys are a cheaper, faster, and farreaching (larger numbers of respondents) data collection option than other survey methods such as in-person interviews, telephone interviews or mail surveys. Additionally, the fact that several features can be added to online survey designs (these can include videos, audio clips, and product nutrition information labels) has made online surveys a staple for many consumer researchers [46]. Conrad et al. [47] suggested that inclusion of dialog-like features in web survey designs could improve respondents' understanding of survey questions and accuracy of collected responses. The RATA question format is one possible way of including human dialogue to online surveys.
The overall objective of the current writing was thus to examine the characteristics of data that were collected using the RATA and RATING question formats in an online survey. Comparison of RATA data with data collected using the gold-standard method could provide better understanding of when researchers could use the RATA question format. Additionally, five versions (five languages/countries) of this online survey were conducted to assess the consistency or replicability of the data characteristics for the two question methods. Specific objectives for the questionnaire comparisons were to (a) compare the percentage of "apply" responses for RATA and RATING; (b) compare response distribution based on ratios of "apply" responses; (c) compare the mean scores for eating motivation constructs or terms for each food category within five countries; (d) identify the level of importance accorded to constructs by RATA and RATING question formats; (e) compare the significant differences among food categories or samples for RATA and RATING; (f) compare consumers' survey mean duration, survey liking, just-about-right (JAR) rating, and completion rates for RATA and RATING survey formats.

Survey Structure
An eating motivation survey (EMS) which included questions on consumers' motivations for eating or not eating food items that belong to five food groups was used for this online study [1,7]. The questionnaires were randomly assigned to respondents in either the RATA or RATING formats but not both (each respondent saw only one format). A total of 47 positive motivation terms that could be categorized into 16 eating motivation constructs were assessed in each question format of the EMS [1,7]. Each eating motivation construct consisted of three terms or subscales except for the choice limitation construct that had only two terms. Details on why the authors used the 47 motivation terms and how "apply" responses for the subscales were summarized into 16 constructs has been published previously [1,7]. For the RATA question format, the 47 terms were randomized for each respondent. Additionally, respondents marked the terms that applied and contin-ued to rate how much each of the checked terms applied based on a five-point intensity type scale. The scale was anchored with "not at all important" at one end and "extremely important" at the other with internodes of "slightly important", "moderately important", and "very important". The RATING question format, on the other hand, did not provide an option for the respondents to check what terms applied but rather asked them to rate the level of importance or applicability of each of the 47 terms based on the same five-point Likert intensity scale that was used for the RATA format. RATING questions were not randomized for each respondent. For RATA, all 47 items were presented on a single page for the respondent to Check All That Apply followed by separate pages for individually rating each of the selected terms. As for the RATING question format, five terms were assessed on a single page because of computer screen page limitations. These formats are typical of many on-line or computer-based consumer studies using RATA or rating. The number of respondents and number of terms or attributes that were assessed in the current survey was not unusual. In fact, the literature shows several articles where a similar number of terms or attributes were evaluated [2,4,[48][49][50].
The subject for survey questionnaires was consumers' motivations for eating items that belonged to five food groups. The food groups included foods rich in starch (e.g., potato and rice dishes), proteins (e.g., meat, beans), dairy, fruits, and sweet foods/desserts [48,51]. Authors used food items that fit in these food groups and were relevant to the particular country [7]. For example, in all countries, bananas were used as the fruit. In the case of starch-rich foods, baked potatoes were used for the USA, while paella was used for Spain and white rice was used for Brazil, China, and India. These foods were chosen based on discussions with multiple sensory scientists in each country who reviewed and discussed all the foods chosen in all countries to ensure the products represented the "concept" of the food category as much as possible for that country. Where possible, similar foods were used (e.g., white rice in three countries for "starch-rich foods"), though where the product was not widely consumed in that form (e.g., Spain) or not consumed in a similar form by a large percentage of the population (e.g., USA), alternative products were selected that were more commonly eaten.
The online survey questionnaire also included other questions that were included in the survey timing. For example, two questions that investigated the respondents' survey experience in terms of respondent liking (a hedonic question) and a rating question based on the perceived length of the survey (a just-about-right or JAR question) were included near the end of the survey. The respondents' survey liking question and the JAR question were each placed on separate pages. The two survey questionnaires were initially written in English for the respondents in the USA. The approved survey question formats written in American English were then translated into Simplified Mandarin, Hindi, Spanish, and Portuguese for respondents in China, India (English also provided as an option), Spain, and Brazil, respectively. The survey translation process used a variation of the translation, review, adjudication, retesting, and documentation (TRAPD) approach [52,53]. The full procedure for the survey methods, including translation, and the surveys in all five languages have been published previously [7].
All subjects gave their informed consent for inclusion before they participated in this online survey. Additionally, the survey was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the designated review board at Kansas State University, Manhattan, KS, USA subjects (IRB #7297.2).

Respondents and Recruitment
Respondents in five countries were recruited by Qualtrics, Provo, UT, USA using its or its partners' existing databases. Using the Qualtrics survey software, one format of the survey questionnaire with RATA questions and another with RATING questions were assigned randomly to 400+ respondents per country (N~200 per questionnaire per country) [7]. Respondents were required to be 18 or older and then were recruited to fill demographic quotas of age and gender for each questionnaire format (RATA and RATING).
Four age groups (n = 50+ per age group) were used in this study: Generation Z (born in the years 1995 to 2001), millennials (born in the years 1980 to 1994), Generation X (born in the years 1965 to 1979) and baby boomers or boomers (born in the years 1944 to 1964). For each age group, 50% were female and 50% were male. Once the required number of completed responses for a particular quota was filled, newly qualified respondents (for the filled quotas) were discontinued from completing the EMS. Other demographic data that were collected for informational purposes included respondents' level of education and number of adults and number of children in the households (Table 1). The respondents in both samples were selected with "matched" demographics of age and gender within each country. The sample sizes are also reasonably large for each group (>200 per group in each of the five countries) and no characteristic (gender, age, education, household numbers) was significantly different between the two samples from each country. Thus, we conclude that any differences noted between the two question formats are likely driven by the formats and not some inherent bias among the respondents.

Comparison of Percentages of "Apply" Responses
Consumers' responses were categorized into two baskets. The first basket was the "apply" basket that included RATA and RATING responses, where consumers rated the motivation terms or subscales as important such as either "slightly important" or "moderately important" or "very important" or "extremely important" to them eating the particular items that belonged to the different food groups. The second basket included the "not at all important" or "not apply" responses for the RATING survey questions. Additionally, this basket consisted of responses for cases where respondents on second thought rated a term as "not at all important" even though they had previously checked it as "apply". Data analyses such as comparisons between RATA and RATING question formats in the current writing focused on the "apply" responses, e.g., percentages of "apply" responses, standard indices for RATA and RATING, and ratios of RATING to RATA.
The percentages of "apply" responses for RATA and RATING for all 16 motivation constructs for all five food groups for all countries were calculated. Additionally, although the current paper focused on comparisons between RATA and RATING data, Check-All-That-Apply (CATA) "apply" percentages based on RATA data were also calculated. Percentages were used because the possible number of ticks/checks varied depending on how many people ate that particular food in a particular country and the number of subscales in the eating motivation category.
2.3.2. Establishment of Standard Indices for RATA and RATING and Ratios of RATING to RATA Standard indices of importance (SII) of "apply" responses were determined for all motivation constructs versus liking within RATA or RATING survey formats. The standard index of importance is an index value that shows the proportion of the number of "apply" responses for any motivation construct to "apply" responses for the liking motivation [1]. Earlier studies [1,48,49] have shown the Liking construct to be the highest motivation on average. Using liking as the comparison index factor (the denominator in the proportion calculation) within each food group, country, and consumer demographic segment allows a within-sample "variable" to be used to adjust all comparisons and put them on a similar "scale" (typically 0-1.0) [1]. Note, it is possible to exceed 1.0 when a motivation exceeds liking in importance for a group of consumers, although this rarely happened. Put simply, the SII is 1.0 when the "apply" responses for any motivation are equal to the "apply" responses for liking within that method for that group of respondents. Similarly, the SII would be 0.5 when a motivation response is half the number compared to liking and so forth. The index was created using the principles espoused by creators of other indices for psychological phenomena that must be compared across various segments though can vary in response behavior across segments [54,55]. If the RATA and RATING formats were assessing the same behavioral patterns of consumers, then the SII values for CAS and CATA for the different motivation constructs would be similar or relatively close. However, if the SII values for the two formats were different, this would indicate that the questions from the two formats were interpreted, processed, and answered differently by the respondents. Major differences in standard index values for motivations within RATA or RATING would suggest that the results of the two survey formats likely would provide different information to the consumer researchers. Such findings would suggest that RATA and RATING, for various reasons, do not measure the same psychological phenomena or, at a minimum, the results would be interpreted differently [1].
Further, the ratios of percentages of "apply" responses for RATA to RATING were calculated to determine whether the ratio of responses varied or remained the same between the two survey formats.

Comparison of Mean Scores for All Eating Motivation Constructs
Meyners et al. [12] recommended the use of mean scores in the analysis of RATA data as opposed to analyzing RATA data as CATA. Two issues had to be addressed when using the RATA rating data. First, in cases where respondents changed their mind, i.e., the respondent checked a motivation term as "apply" but then rated it as "not at all" (suggesting that they should not have checked the term to begin with), that specific data point was included in the analysis as a "1". Second, in cases where none of the motivations within a construct (usually three motivation subscales per construct) were checked by a respondent, a score of "1" (not at all important) was used in the analysis of the overall construct for that consumer.
Two-sample t-tests were used to compare mean scores for RATA and RATING responses for all 16 constructs for all food categories in all five countries with a significance level of p ≤ 0.05 [3]. Significant differences (p ≤ 0.05) between means of a particular motivation construct would indicate that not only did the consumers interpret, process and answer the respective questions differently but also the interpretation by researchers could be different.

Identification of the Level of Importance for Motivation Constructs
Motivation constructs whose percentage of "apply" responses made the list of top five for the RATA or RATING survey formats for each country for all the five food categories were identified [1]. Similarly, motivation constructs whose mean scores for RATA and RATING were in the top five positions within each food category and each country were also identified.

Comparing Significant Differences among Food Categories
Analysis of variance (ANOVA) at a 5% level of significance was used to identify significant differences among the food categories [3,12]. Post hoc mean separation was carried out using Fisher's Least Significant Difference (LSD). This was performed to determine which of the two question formats showed better discrimination ability among samples.

Comparison of Survey Format Completion Rates, Survey Mean Duration, Survey Liking, and Survey JAR Rating
Percentages of completion rates for consumers who answered either RATA or RATING question formats of the survey were calculated. Additionally, chi-square tests at a 5% level of significance based on counts of incomplete responses for each format in each country were computed. Additionally, two-sample t-tests at a 5% level of significance were computed to provide comparisons of survey format means and standard deviations for consumers' survey mean duration, survey liking, and survey JAR rating for each country.

Comparison of Percentages of "Apply" Responses
In this paper, the term "apply" refers to (a) responses for which the respondents selected motivation terms (in RATA) or (b) marked responses for "extremely important", "very important", "moderately important" and "slightly important" for either RATA or RATING survey formats. The RATING question format was associated with a significantly higher percentage of "apply" responses for all 16 motivation constructs for all five food groups in all countries as compared to corresponding CATA and RATA data. For example, In Brazil, CATA and RATA question formats showed that 48% and 47% of respondents, respectively, identified habits as an important motivation for eating starch-rich foods while RATING showed that 94% of corresponding respondents identified the habits construct as important (Table 2). Data for protein-rich foods, dairy foods, fruits and vegetables and dessert foods are presented in Appendix A Tables A1-A4. A similar case was seen in China, where 80% of RATING question format respondents identified visual appeal as an important motivation for eating white rice (a starch-rich food), whereas only 4% and 3% of CATA and RATA responses, respectively, identified the same construct as important. Seeing that "visual appeal" of starch-rich foods garnered a higher frequency in RATING than RATA in China implies that it may be more important than the RATA or CATA suggest. It is also possible that either the RATING questions overestimated the level of importance of the visual appeal construct or that the RATA questions underestimated the level of importance of the same construct to the respondents.
The higher percentage of "apply" responses for the RATING survey format was expected based on multiple aspects, but has important implications for researchers. When consumers show that a term or construct is more applicable in one method than another, that shows that the method impacts the interpretation of the information. For example, in the data shown in Table 2, only approximately one-quarter of consumers (or less in some countries) using the RATA format indicated that eating starch-rich foods (i.e., rice or potatoes) was motivated by health concerns. In contrast, for the RATING format, more than three-quarters of consumers in each country indicated that eating such foods was motivated by concerns related to "health". Those findings bring vastly different conclusions about the importance of "health" in selecting such foods. Product developers, sensory and marketing scientists, and nutrition and health professionals would use different strategies to encourage or discourage such consumption depending on which method was used for the research. That points to a major problem and discrepancy that needs to be addressed before a decision is made regarding survey methods. Which method is correct? We cannot know from this research and further investigation is needed. Table 2. Percentage of "apply" responses for CATA, RATA, and RATING for all five countries for the respective starchrich foods.

Brazil
China This suggests that the differences may be an artifact of the testing methodology, either from difference in the psychological "threshold" of importance used by consumers in the various methods or in various biases that may be inherent in the methods. RATA respondents checked only terms that "applied" or were important and then continued to rate the level of importance of the selected terms. Not checking a term could be the result of not considering it "important enough" to check. Some respondents may have only checked terms that were of the highest importance to them and, thus, rated only those terms. Inherent biases such as not checking and subsequently not rating a term in RATA because the person did not notice the term can occur [4]. That is impossible in a forced testing method such as RATING. If the consumer was rushed, used a small screen, or simply missed a line of print for example, they could unintentionally not check some terms that otherwise might have "applied". Primacy bias (checking those terms that occur earlier more often than those that occur later) among RATA "apply" responses also can occur even though terms were randomized across the respondents. However, it is possible that this bias could have influenced the total percentage of "apply" responses over all RATA respondents, even though the effect should be small.
It is important to note that the percentage of RATA "apply" responses for all motivation constructs for all food categories in all countries remained the same or reduced slightly when compared to corresponding CATA data. Prior studies [3,14] that compared CATA and RATA data showed that the percentage of "apply" responses for the attributes increased with use of RATA as compared to CATA. There are two possible explanations for this occurrence. Firstly, in those studies [3,14], one group of respondents saw the CATA question format of the survey while the other saw the RATA format, whereas, in the current study, respondents saw only the RATA question and we derived the CATA "apply" responses based on the first task in the RATA format. This implies that the CATA percentages shown here are part and parcel of the RATA data. Secondly, in some, but not all RATA question formats [3,14], consumers were asked to check terms that applied and then rate those that they had selected on the same page. It is possible that consumers who see the check box and the rating box on the same page are more likely to select "apply" more often. It is possible that the percentage of "apply" responses for RATA responses in the present study was lower because the consumers did not see the rating scale for RATA until after they had checked the "apply" response. However, if that were the case we might have seen increases in "apply" ratings for foods evaluated after the first one since respondents would have learned they would be asked to rate those that they checked as "apply". We did not find that scenario. Regardless of such findings, we note that the focus of the current study were comparisons between RATA and RATING data and not CATA data.
We also found a few cases where RATA respondents changed their minds about the applicability of some motivation terms that they had previously checked as "apply" and instead rated them as "not at all important" or "not apply" for particular foods. This explains the change in percentages of "apply" responses between CATA and RATA ( Table 2, Appendix A Tables A1-A4). For example, in Spain, while 35% of consumers marked the three subscales for the habits construct as an important motivation for them eating starch-rich foods (CATA), 2% of the same consumers changed their minds and rated it as "not apply" or "not at all important" in the rating portion of the RATA format. Thus, the resulting 33% "apply" responses for RATA. This shows that with the RATA question format, respondents took some time to think about their previous choices as they rated the selected terms for applicability for the particular food items something that the CATA question format does not provide for [14]. Nonetheless, just as Vidal et al. [3] stated, the small differences between RATA and CATA that were identified were particular to terms or attributes and food groups.

Ratios of RATING to RATA and Standard Indices for RATA and RATING
The fact that the ratio of "apply" responses of RATING to RATA question formats for all 16 constructs was greater than one reiterated our findings that the RATING question format had a higher percentage of "apply" responses as compared to the corresponding RATA data (Table 3, Appendix A Tables A5-A8). In addition, it was also evident that the importance of eating motivation constructs based on ratios of RATING to RATA varied among the five food groups depending on country. For example, in India, the importance of convenience in the eating of fruits and vegetables (Table A5) increased 12fold using RATING, whereas it increased approximately only 6-fold for the starch-rich foods (Table A6). For the dairy category (Table 3), it increased 10-fold while for both protein-rich (Table A7) and desserts categories (Table A8) it increased 9-fold when the RATING question format was used in India. On the other hand, in China, importance for the convenience construct increased 23-fold for both protein-rich and desserts categories when the RATING survey format was used. Except for the liking motivation, similar variations were also noted for the other motivation constructs among different food groups across the five countries depending on whether RATA or RATING questionnaires were used.
We did notice, however, that several of the larger differences (30+) in construct importance between RATA and RATING data occurred among motivation constructs that received the lowest ratings overall. Such motivations included; affect regulation, social image, and social norms. It is also worth noting that for food groups such as proteinrich foods, dairy and fruits and vegetables, RATA respondents in Brazil did not consider (neither checked nor rated) any of the three terms or subscales for the affect regulation construct to be important motivations for eating the aforementioned foods (Tables A1-A3). Consequently, the affect regulation construct received zero "apply" responses and zero values for corresponding standard indices of importance for the RATA question format (Tables 3 and A5 and Table A7).
Overall the liking motivation construct had a higher percentage of "apply" responses for eating foods from the five food groups across all five countries (Table 4 and Appendix A  Tables A1-A4). It did not matter what question format (whether RATING or RATA) was used, liking was the most important motivation for the consumers. These findings support several earlier studies that found a similar concept [1,49,56]. Table 3. Ratios of RATING "apply" responses to RATA "apply" responses (R) and standard indices of importance for RATING (S) and RATA (T) "apply" responses for each motivation construct to the liking motivation construct for dairy foods in all five countries.

Brazil
China  In consideration of that finding, the authors established standard indices of importance (SII) to identify how the other constructs compared with liking, the greatest motivation construct. A motivation construct found to have closely similar SII values for RATA and RATING would indicate that the relative importance accorded to it by either RATA or RATING were also closely similar [1]. For example, in Brazil, the difference between the two indices (SII:RATING minus SII:RATA) for the habits construct for starch-rich foods (0.05), protein-rich foods (0.2) and the dairy food category (0.26) could be explained as expected random variation among these values. However, the same cannot be said for the corresponding difference for the fruit and vegetables category (0.53) and dessert/sweet food category (0.71) in the same country. Clearly, in this case, consumers interpreted, processed and answered the RATING and RATA questions differently. Similar large differences (SII:RATING minus SII:RATA) were observed for other motivation constructs among the five food groups across all five countries. Further, the differences (SII:RATING minus SII:RATA) in the SII index ranged from 0.28 for the USA protein-rich foods category, to 0.95 for that same category in India. At this point, we found that not only was one survey format providing a significantly higher percentage of "apply" responses but also that the "apply" responses could be different. This strengthens the case for different information being provided by the two question formats which could result in variations in data analysis, interpretation and study conclusions by researchers.
To correctly understand the perceptions, opinions, beliefs and attitudes of consumers, researchers (e.g., sensory scientists, product developers, nutritionists, and marketers) should ask the right questions and, even more importantly, survey questions should be asked in a structure that collects the most accurate responses. Thus, determining what question format to use would be a critical step in the design process for upcoming online consumer studies. For the current study, we did not conduct exit interviews or focus groups (qualitative research studies) with the respondents (both RATA and RATING) from the five countries to validate the respective collected data for accuracy. As such, we could not prove that one survey format underestimated or overestimated the consumers' responses. Further research in the validation of RATING and RATA data is warranted. It must be noted, however, that RATING has been the de facto standard for collecting sensory and consumer behavior data for decades. Although that does not mean that it is, in fact, correct, it does suggest that it is incumbent on authors proposing new methods, such as RATA, to show that the data produced are either similar or better than existing methods.

Comparison of Mean Scores for Eating Motivation Constructs
Results showed that mean scores for RATA and RATING data were similar for some attributes (constructs) for some countries and for some food categories. A case in point, both RATA and RATING survey respondents in China and the USA identified the habits eating motivation as very important to their eating of protein-rich foods (Table 4), dairy foods (Table A9) and Fruits and vegetables (Table A10). This could imply that in some situations particular respondents (or, in this case, respondents in certain countries) interpreted and processed the subscales or terms for particular constructs similarly for both RATA and RATING survey questions. Put simply, the same level of importance was placed on attributes/constructs in such cases. However, that was not always true. For example, although consumers in the US gave the same degree of importance (moderately important) to the habit construct for starch-rich foods using either format (Table A11), for dessert foods RATA respondents reported habits to be "very important" while corresponding RATING respondents found it to be "moderately important" (Table A12).
Except for China, where seven, eight and ten constructs were found to have similar mean scores for the two question formats for the protein-rich, dairy, and dessert food categories, respectively, other countries each had at most only five out of the 16 constructs that had similar mean scores for the two question formats for all food categories. RATA respondents from all countries pointed out that social image was a very important motivation for them eating protein-rich foods. However, corresponding RATING data for consumers in Western societies (Brazil, Spain, and the USA) identified the same construct as slightly important RATING data for China and India categorized social image as "moderately important". Obviously, it would be illuminating to compare the impact of consumers' demographic aspects on the RATA and RATING "apply" responses. However, this was not the objective for this paper.
Consumers' RATA mean scores in Brazil, India, Spain and the USA for close to three-quarters (11/16) of the constructs for all food categories were significantly higher (greater level of importance) than those of corresponding RATING scores. In fact, in Spain, only two motivation constructs had similar mean scores for RATA and RATING for any food categories. At least fourteen had significantly higher mean scores based on RATA questioning as compared to RATING in every food group. This shows that consumer insights gathered using the RATA and the RATING question formats in online survey may not necessarily be the same. We noted also that overall the mean score values for the RATA question format were higher than those of corresponding RATING data for all food groups in all countries. This was true for all constructs except for the habits and convenience constructs regardless of whether the differences were statistically significant or not. There are two explanations for this occurrence. Firstly, the RATA question format requires respondents to select only attributes that are important and then rate the selected terms for "applicability" or level of importance. It can be assumed that all the terms selected at this point do "apply" though they may apply at different levels of importance. The RATING question provides no "opt out" option for but rather asks consumers to rate all statements or terms based on a scale from 1 to 5, where 1 means not at all important. If that score is chosen, the construct mean will decrease. It would appear that for ratings used during RATA, consumers were more likely to choose higher scores for importance since they had already stated that the motivation terms or statements were applicable. Furthermore, the five-point scale that was used included a "not at all important" option, which gave RATA respondents an "out" in case they changed their mind (i.e., checked a term as "apply" but then rated it as "not at all"). Such responses were included in the analysis and were added in as 1, which could increase the mean score slightly for RATA data. The case was not the same for the RATING responses which were treated as is [24][25][26]57].
We also noted eight cases that were linked to the habits and convenience constructs for particular food categories in Brazil, China and the USA, where the mean score for RATING was slightly higher than the corresponding RATA value. However, of these eight cases, it was only the habits construct under the starch-rich food category in China, where the mean score for RATING significantly differed from that of RATA (Table A11).
It is also important to note that differences between mean scores for RATING and RATA were smaller for constructs that were most frequently used by respondents. In India, for example, the differences for frequently used motivations (e.g., liking (0.52), habits (0.32) and convenience (0.32)) for the protein-rich category were less than the corresponding differences for infrequently used motivations (e.g., social image (0.87), affect regulation (0.96), and weight control (0.89)). As demonstrated also by the ratio of RATING to RATA "apply" responses and standard indices for the two question formats (Table 5, Appendix A Tables A5-A8), this indicates that the level of importance is likely to vary more among less-frequent and moderately used attributes or motivation constructs than frequent ones depending on whether RATA or RATING was used within a food category within a country.

Based on Percentages of "Apply" Responses
Inspection of the relative positioning of the motivation constructs for RATA and RATING data for each country provided more understanding of the level of importance consumers accorded to each construct for the different food categories. Based on percentage of "apply" responses, consumers (both RATA and RATING respondents) in Western countries (Brazil, Spain, and the USA) identified the liking construct as the most important motivation for eating foods from all five categories (Figure 1a). This was not surprising since similar findings were attained by other authors in prior related studies [1,49,56].
However, for Asian countries (China and India), while the liking maintained the top most position for the food categories such as desserts, its ranking varied inconsistently for other food categories. Rate-All-That-Apply responses in China, for example, suggested that liking, pleasure, need and hunger were the leading drivers for the eating of protein-rich foods in that order, while RATING "apply" responses pointed out need and hunger, liking and habits in that order as the constructs that drove consumers in China to eat proteinrich foods. A similar case was seen in India where RATA data showed that consumers ate starch-rich foods mostly because they liked them while corresponding RATING data noted that Indians ate starch-rich foods mainly because it was a habit.
Eating motivations such as habits, need and hunger, convenience, and pleasure joined the liking construct and took positions among the top five constructs for eating behavior across most food categories and countries regardless of whether RATA or RATING survey question formats were used. Furthermore, traditional eating, natural concerns, health, weight control, and sociability were the other eating motivation constructs that also appeared among the top five positions, though these depended on the survey question format used, food category and country of target population. It is worth noting however that the level of importance for the latter set of constructs differed between RATA and RATING data more frequently as compared to the former set of motivation constructs. This further suggests that although some similarities between RATA and RATING data can be found, information collected using RATA questions and that collected using RAT-ING questions may be different and may be interpreted differently potentially leading to different conclusions and decisions. Another way to look at the ranking of attributes based on level of importance is to compare the data analysis approaches that were used for RATA and RATING data. To calculate the percentages of "apply" responses, RATA data were treated as CATA data, implying that the consumers' responses were analyzed as binary numbers (1, 0) where a value of 1 was placed for each subscale or term that was selected as an important motivation for the consumption of a particular product category. Additionally, a value of 0 was placed for each subscale or term that was not selected as an important motivation for the consumption of a particular product category [12]. For RATA, the ratings or intensity scores were ignored except for cases where respondents changed their mind, i.e., the respondent checked a motivation term as "apply" but then rated it as "not at all important" (suggesting that they should not have checked the term to begin with), that specific data point was excluded from the analysis. Although the data show that rarely happened (<2% of cases), this decreased the percentage of apply responses and ranking slightly for RATA data. On the contrary, when computing the percentages of "apply" responses for RATING data, all response categories but "not at all important" were categorized as 1 and the only the "not at all important" were categorized as zero.
Additionally, those differences may depend on the particular product or sample (or food category) and country or culture of target population. Careful consideration is therefore recommended for consumer researchers when determining what question format to use in future online surveys for particular products because the level of importance given to each attribute or term may change depending on what survey format a respondent answers and the particular product(s) being assessed. More investigation into the accuracy of both methods may be needed before suggestions for use of one question format over the other can be made. motivation for eating foods from all five categories (Figure 1a). This was not surprising since similar findings were attained by other authors in prior related studies [1,49,56]. However, for Asian countries (China and India), while the liking maintained the top most position for the food categories such as desserts, its ranking varied inconsistently for other food categories. Rate-All-That-Apply responses in China, for example, suggested that liking, pleasure, need and hunger were the leading drivers for the eating of proteinrich foods in that order, while RATING "apply" responses pointed out need and hunger, liking and habits in that order as the constructs that drove consumers in China to eat protein-rich foods. A similar case was seen in India where RATA data showed that consumers ate starch-rich foods mostly because they liked them while corresponding RATING data noted that Indians ate starch-rich foods mainly because it was a habit.  Eating motivations such as habits, need and hunger, convenience, and pleasure joined the liking construct and took positions among the top five constructs for eating behavior across most food categories and countries regardless of whether RATA or RAT-ING survey question formats were used. Furthermore, traditional eating, natural concerns, health, weight control, and sociability were the other eating motivation constructs that also appeared among the top five positions, though these depended on the survey question format used, food category and country of target population. It is worth noting however that the level of importance for the latter set of constructs differed between RATA and RATING data more frequently as compared to the former set of motivation constructs. This further suggests that although some similarities between RATA and RAT-ING data can be found, information collected using RATA questions and that collected using RATING questions may be different and may be interpreted differently potentially leading to different conclusions and decisions.
Another way to look at the ranking of attributes based on level of importance is to compare the data analysis approaches that were used for RATA and RATING data. To calculate the percentages of "apply" responses, RATA data were treated as CATA data, implying that the consumers' responses were analyzed as binary numbers (1, 0) where a value of 1 was placed for each subscale or term that was selected as an important motivation for the consumption of a particular product category. Additionally, a value of 0 was placed for each subscale or term that was not selected as an important motivation for the consumption of a particular product category [12]. For RATA, the ratings or intensity scores were ignored except for cases where respondents changed their mind, i.e., the respondent checked a motivation term as "apply" but then rated it as "not at all important"

Based on Motivation Constructs' Mean Scores
Across all five countries, the RATA question format gave a larger variety of top five motivation constructs based on mean scores as compared to those of the corresponding RATING format. This may be the result of differences in actual motivations across product categories that show up using the RATA format or could be an artifact of testing. We note that the RATING format produced more consistent top five motivation constructs when determined based on the percentage of "apply" responses and mean scores than did the RATA format. In RATA top five constructs sometimes changed depending on the data used.
In Brazil, except for convenience, which was replaced with natural concerns for the dairy food category, the same constructs were identified either based on percentages or mean scores for all five food categories using RATING. Another example was seen in Spain where the same key constructs were pinpointed based on percentages or mean scores for all food categories except for habits. Habits was replaced with traditional eating among the top five motivations for eating starch-rich foods when the mean scores for the constructs were compared using RATING data. That was not the case for the RATA survey format. We also noted that for the RATING survey format, the motivation construct ranking within the food categories did not change much particularly in Western countries (Brazil, Spain and the USA) and when the ranking did change, the constructs' positions moved slightly. Conversely, for the RATA question format, several infrequently used motivation constructs such as affect regulation, social norms, social image, and choice limitation joined the list of top five constructs. For example, based on percentage of "apply" responses both RATA and RATING identified liking as the most important motivation for eating starch-rich foods. That was followed by habits. However, RATA mean scores suggested that affect regulation followed by natural concerns took the lead and liking came in fifth.
Habits did not appear in the top five positions for motivations for eating starch-rich foods in Brazil. As for the RATING survey format, liking and habits both maintained their lead as key motivations for eating of starch-rich foods in Brazil. These findings suggest that ranking of attributes based on level of importance was more consistent for the RATING question format but changed significantly for RATA depending on whether ranking was based on percentages of "apply" responses or mean scores for attributes or constructs.
Meyners et al. [12] who analyzed RATA data both as CATA and also as a parametric found that RATA data were more meaningful when treated as parametric. At the time of writing, we did not find any research that provides more insights on how RATA and RATING compare in terms of discrimination among products, degree of importance or applicability of attributes. More investigation is needed to provide more understanding on ranking of attributes based on attribute percentage of "apply" responses and attribute mean scores.

Significant Differences among Samples
Overall, the total number of cases where the eating motivation constructs had significant differences among the food groups or samples for the RATING question format (n = 67) was higher than that of RATA (n = 20) (Table 5). This was showcased in all five countries. For example, in both Brazil and the USA, the RATA question format identified only four cases where significant differences were found among the samples, whereas, for RATING, the number of significant differences among samples was more than 3-fold higher (n = 14). Clearly, RATING was more discriminating among samples than RATA. This finding can be key for consumer researchers when designing future online surveys that would investigate characteristics of products or samples that are similar or closely related. Although it is not known which of the significant differences are "true", the RATING format has been the gold standard for many decades and does appear to give somewhat different results than RATA. Further work is needed to determine impacts of such differences on findings in sensory and consumer behavior studies.
3.6. Comparison of Survey Format Completion Rates, Survey Mean Duration, Survey Liking, and Survey JAR Rating 3.6.

Consumers' Survey Question Format Completion Rates
Chi-square tests showed that the percentage of incomplete responses for RATING data for countries such as Brazil, India and China were significantly higher than those of corresponding RATA data ( Figure 2). This information could be beneficial when planning future international consumer studies (with RATA or RATING questions) in these five countries or countries with similar cultures.

Consumers' Survey Mean Duration
In China and Spain, consumers who answered the RATING format of the online survey took a significantly longer time to complete the survey compared to their counterparts who completed the RATA survey format (Table 6). That was not unexpected, especially since RATA respondents rated only those terms or attributes that they considered to "apply", whereas RATING respondents rated all 47 terms. It was, however, surprising to note that in the USA, consumers took slightly longer (although not significantly) to complete the RATA questions than the RATING questions.

Consumers' Survey Just-about-Right (JAR) Rating
Apart from India, respondents from all five countries rated the RATING version of the survey as a little too long, while the RATA version of the survey was rated as JAR (Table 7). In India, however, the RATING survey format was rated as JAR, while the RATA format of the same survey was rated as a little too short. Except for the USA, this finding can be explained by the more time that respondents needed to complete the RATING format of the survey. These findings suggest that neither of the formats was overly burdensome to those who completed the questionnaire, but other factors such as survey liking and completion rates may be important.

Consumers' Survey Liking
In all five countries, the RATA versions of the survey were liked significantly more than the corresponding RATING versions of the same survey ( Table 8). The higher liking gained by the RATA survey format could be expected by the JAR survey ratings. It must be noted that for both formats, the mean values for liking are positive, suggesting that at least for many consumers, the format they used in testing was acceptable.
In the case that the information collected from the shorter surveys satisfies the research objectives, then there may be no need to conduct longer surveys, especially since longer surveys would cost more. On the other hand, longer surveys could be used in place of shorter surveys in cases when more robust information is needed from the consumers. Additionally, longer surveys could negatively affect the online survey completion rates, which could increase the difficulty in attaining the required number of complete responses. However, survey duration and completion rates should not be used as a key basis for determining what question format to use in online survey questionnaires considering that quality of data could be impacted.

Study Limitations
It is possible that a proportion of the target population did not participate in this online survey (coverage error) simply because they lacked access to a stable and steady internet connection [58,59]. This implies a limitation to the inferences that can be made based on the current internet survey. According to Armstrong et al. [60], differences in response data, particularly in multi-country online surveys, can be ascribed in part to different recruitment software. Although we used the same software in all countries, the actual devices used (i.e., computer, phone, etc.) are likely different from country to country and may have some impact on the results. Similarly, paper ballots could be included in future survey designs as an option for respondents within the target populations who may not have access to the internet. Other survey limitations such as selection of particular samples (food groups) have been discussed previously [1].

Conclusions
This online survey showed that the RATING question format provided more "apply" responses for each attribute than the RATA question format. Additionally, based on the standard indices for RATA and RATING, the RATING question format showed better discrimination ability among attributes for all food categories in all countries as compared to the corresponding RATA data. Additionally, overall, the RATA mean scores for the attributes were found to be significantly higher (greater level of importance) than those of the RATING survey format. Further, the RATING question format showed better discrimination ability among food categories or samples than RATA for all motivation constructs or attributes and within all countries. More investigation into the use of the RATA and RATING question formats in future consumer research is needed.

Informed Consent Statement:
A notice of informed consent was provided to all consumer survey participants who were anonymous to the researchers.

Data Availability Statement:
The data from this study are not publicly available.

Conflicts of Interest:
The authors declare no conflict of interest.
Appendix A Table A1. Percentage of "apply" responses for CATA, RATA, and RATING for all five countries for the respective proteinrich foods.