Concurrent and Convergent Validity of a Single, Brief Question for Physical Activity Assessment

An extensive number of self-reported methods for physical activity (PA) measurement are available, including short and long recall questionnaires ranging from a few to tens of questions. Due to the fact that simple, time-saving methods could be more practical and desirable for use in a busy clinical context, as well as in public health surveys, we evaluated how a single-item question might be a useful and cost-effective method for assessing compliance with PA guidelines. Using multiple receiver operating characteristics (ROC), we assessed the classification performance of a single brief question, employing the short version of the International Physical Activity Questionnaire as criterion instrument, in a total of 55,950 people (30,601 women and 25,349 men). Both those who practice PA almost daily and a few times a week presented an upper threshold (1042.5 metabolic equivalent minutes (MET) minutes/week) to the established compliance PA guidelines (600 MET minutes/week) with high specificity and sensitivity, using a sedentary group as reference. Otherwise, the occasionally physically active group did not reach the minimum (349.5 MET minutes/week) and obtained a poorer classification performance. A single brief question is a pragmatic and alternative method for assessment of compliance with PA guidelines.


Introduction
The correct assessment of a lifestyle pattern such as physical activity (PA) in clinical and epidemiologic context implies remarkable importance and consequences in order to determine its dose-response relationship with mortality and morbidity [1,2], surveillance of physically inactive population trends across time allowing comparisons [3], or to simply control and classify PA as risk factor into inactive or active behavior. It is found that no practice, and absence of PA, are linked to higher morbidity and all-cause mortality, and lower life quality and quantity, highlighting its influence on non-communicable diseases (NCDs) which is equal to, or even higher than, other lifestyle behaviors as dietary, alcohol use, or tobacco use [1,4,5].
Such evaluation could be conducted by means of diverse methodologies and instruments, as either self-reported or direct methods, which involve different and noteworthy applications [2,6]. Examples of direct methods are the use of pedometers, accelerometers, and heart rate monitors; in turn, shortand long-term recall questionnaires, diary notebooks, interviews, and brief questions are encountered among self-reported instruments. Direct measures lessen assessment errors, but some of them are expensive, and almost all mainly require a notorious burden of time (at least one week for most measurement protocols), as well as create logistic issues, such as individual or patient implication [2,6,7]. Contrariwise, self-reported measures are cost-effective tools with high efficiency and capability to easily collect information from large populations or groups; however, these types of instruments present some inherent limitations and biases [2,6]. Thus, the chosen method is underpinned to several factors as assessment purposes, collecting time period, logistics, or effectiveness, among others, and hence, there is not a global gold-standard instrument to evaluate PA pattern.
In certain clinical contexts and large epidemiological studies, the use of self-reported measures could be more feasible. These instruments commonly assess PA by an individual's recall process in a week, month, or even in more timelines, but they may induce a recall bias, especially during completing large questionnaires, through under-and overestimating PA [2,6]. Besides this, the completion time of widely-used questionnaires is around 10 minutes in short versions, a brief amount of time, but larger for instance, than the average primary care physician consultation time, which is 5 minutes or less in half of the worldwide population, limiting some practical applications [8]. This amount of time could be even larger, whether questionnaires are administered using interviews or by telephone, instead of self-administered forms. Thus, the necessity to employ other instruments and methods which are even shorter, but equally efficient, effective and valid at same time, arise. Few studies have addressed this approach, and mainly focus on older adults with modest sample sizes in their validation procedures [9,10].
The use of a single brief question was therefore proposed as a useful and feasible alternative which, although cannot assess total quantity of PA, at least enables classification into active or inactive in overall population in order to identify as risk factor and/or controlling in epidemiological research, or for screening and exercise prescription in busy primary health care settings. Thereby, we here analyze this assumption through using a brief PA frequency question in comparison to a short-term recall questionnaire as the widely-used International Physical Activity Questionnaire (IPAQ) [11], in a total of 55,950 individuals from 2017 and 2013 Eurobarometers, and the classification of performance into physically active and inactive.

Data
Data was retrieved from Eurobarometer survey series. In the present study, Eurobarometer 88.4 [12] and 80.2 [13] from 2017 and 2013, respectively, were selected, as both surveys include IPAQ and a single brief PA frequency question. Each Eurobarometer consists of a cross-sectional survey across the 28 European Union country members, with a representative sample size of approximately 1000 individuals per country, collecting between both surveys a total of 55,950 responders aged 15 and over (30,601 women and 25,349 men).
Sampling was performed by a multi-stage random methodology, drawing several sample points in each country with conditional probability, according to its population size and density among the different stratums and individual units. Age, gender, region, and size of region were used in the iteration process to generate these sample points. Eventually, each participant was randomly selected in each household, and interviews were conducted face-to-face by professional interviewers.

Physical Activity Assessment
Physically active status was considered as compliance with at least one of the World Health Organization (WHO) guidelines: 150 minutes of moderate PA per week, 75 minutes of vigorous PA per week, or any equivalent combination of vigorous and moderate PA [14]. In order to simplify, the WHO guidelines compliance was converted to metabolic equivalent minutes (MET) minutes/week by setting a threshold at 600 MET minutes/week. PA was evaluated by both short-term recall questionnaire employing the short version of IPAQ [11], and a single brief question.
IPAQ measures PA in a usual week across frequency (number of days) and duration (number of minutes per day) at different intensity levels being vigorous, moderate, and walking. In both Eurobarometers, duration is categorically evaluated by the following answers: "Never do any vigorous (or moderate) physical activity or Never walk for 10 min at a time"; "30 minor less"; "31 to 60 min", "61 to 90 min"; "91 to 120 min"; "More than 120 min". Categorical duration responses were then transformed using median values for each answer. For example, a duration of 45 minutes was used for the 31 to 60 min response. For those who reported Never practice physical activity (or never walk for 10 min) and More than 120 min, a zero value and 135 min were applied, respectively. Subsequently, weekly time at each intensity was computed to Energy Expenditure (EE), expressed as MET per week, that is, the EE compared to resting values [15]. Thus, vigorous, moderate, and walking weekly times were multiplied by 8, 4, and 3.3, respectively, and finally summarized to obtain total EE in a usual week. Further information related to METs for different activities is provided elsewhere [16].
Here we study a possible alternative way to rapidly and easily evaluate PA, using the following brief question that was also used in Eurobarometer survey series: "How often do you exercise, play sport, or engage in other physical activity, such as cycling from one place to another, dancing, gardening, etc.? Possible answers were practice PA "Never"; "Occasionally"; "Few times a week", or "Almost daily".

Data Filtering and Statistical Analyses
Several data filters were applied in order to eliminate potential biases. Firstly, only adults (aged 18 and over) were selected, as PA guidelines are considerably different between adults and adolescents. Secondly, all those responders who reported Do not know, or contained missing values either in IPAQ or the brief question, were removed. Furthermore, illogical cases were also eliminated, such as reporting performance of PA 2 days per week and Never practice physical activity, or 0 days per week and 61 to 90 min in the same intensity component (vigorous, moderate and walking) of IPAQ form. This kind of data represents a recall bias and could be capable of negatively affecting statistical analyses, estimations, and interpretation of the results.
Classification of performance was assessed by multiple Receive Operating Characteristic (ROC) curves through MET minutes/week between those who Never practice PA, and responders who do Occasionally, Few times a week, and Almost daily. Therefore, dummy variables were created with Never group as reference. Cutoffs were determined by optimizing for both sensitivity (Se) and specificity (Sp). Area under the curve (AUC), positive predicted value (PPV), negative predicted value (NPV), positive likelihood (LH+), and negative likelihood (LH−) were also computed, with 95% confidence interval (95% CI). Rstudio (3.6.1 version) was used to run all statistical analyses and data filtering.

Results
From a total initial sample size of 55,950 subjects (28,301 in 2017 and 27,919 in 2013), data filtering removed 1103 (1.97%) who were aged under 18; 2,677 (4.78%) who reported Do not know as the answer or contained missing values in any of studied variables, and 12,791 (22.86%) illogical cases when reporting IPAQ questions, resulting in a final sample size of 39,379 individuals (70.39%).
Estimated cutoff and corresponding classification performance values are presented in Table 1. For responders who practice PA Almost daily and Few times a week, the cutoff was set at 1042.5 MET minutes/week, whereas those who practice Occasionally had a lower cutoff set at 349.5 MET minutes/week. Moreover, regarding to performance parameters, AUC became greater as frequency grew higher, from Occasionally (0.751; 95% CI: 0.743 to 0.759) to Few times a week (0.894; 95% CI: 0.891 to 0.898), up to Almost daily (0.947; 95% CI: 0.945 to 0.950). Sensitivity, Specificity, and LH+ were also higher among Almost daily responders and lower in Occasionally; conversely LH-was higher in Occasionally (0.401; 95% CI: 0.383 to 0.421) and lower in Almost daily (0.127; 95% CI: 0.119 to 0.135). PPV and NPV performance were diverse across groups with no trends. The Occasionally group reached a considerably low PPV (0.47; 95% CI: 0.461 to 0.486), in contrast to those who reported practicing PA Almost daily (0.862; 95% CI: 0.855 to 0.871) and Few times a week (0.904; 95% CI: 0.898 to 0.907), while the later also presented the lowest NPV (0.772; 95% CI: 0.765 to 0.783). From a visual perspective of classification performance, the Occasionally ROC curve differs substantially with respect to the others in a similar way to performance parameters estimated by cutoffs ( Figure 1). On the contrary, Almost daily and Few times a week ROC curves present visually slight differences, and seem to be more similar, compared to the Occasionally group.  From a visual perspective of classification performance, the Occasionally ROC curve differs substantially with respect to the others in a similar way to performance parameters estimated by cutoffs ( Figure 1). On the contrary, Almost daily and Few times a week ROC curves present visually slight differences, and seem to be more similar, compared to the Occasionally group.

Discussion
The ROC results of the present study have shown strong performance in classifying as physically active those who practice PA Almost daily or Few times a week, with high specificity and sensitivity, especially in the former. The cutoff of 1042.5 MET minutes/week for both groups is placed sufficiently above the threshold of 600 MET minutes/week established compliance with PA guidelines, according to WHO [14]. Nonetheless, classification of performance among those who Occasionally practice PA, compared to the sedentary group, was limited, with a cutoff of 349.5 MET minutes/week, lower than the PA compliance threshold. As such, this diminished performance suggests that both groups (Never and Occasionally) present many similarities; hence, such differentiation results in complex generation of misclassifications. Adding estimated cutoffs to these misclassification problems supports the hypothesis that those who practice PA Occasionally could be considered as physically inactive individuals.
With regard to clinical context, the current self-reported PA assessment tools used in primary care do not present satisfactory properties with respect to validity and reliability [17]. Furthermore, people are more prone to recall PA that involves heavy load or considerable effort (i.e., vigorous activities) than moderate and light activity in daily life [11]. However, the contribution of these vigorous activities represents a small proportion of total time and EE, being more common than light and moderate PA [18]. Recall questionnaires such as IPAQ, which assess PA by splitting into three intensities, may have this type of bias. Therefore, although the exhaustive assessment of PA is also important in order to determine dose-response relationship or simply to control cofounder effects, the use of a single brief question with a global PA assessment could be more suitable, and entails important practical applications, either on large epidemiological surveys or on busy and hasty primary care consultation, allowing for time saved in classification of PA pattern. Other studies have validated similar brief questions and questionnaires [9,10], nonetheless, the employed samples were based on specific population groups, such as older adults, and with limited sample sizes, restricting results' extrapolation and their applications in other contexts. In contrast, the present study includes a notorious and relevant sample size in a large spectrum of European adults, not heretofore provided in this type of single brief question validation study related to PA assessment.
Besides time-saving applications formerly mentioned, the use of this brief approach to assess PA could also involve strong implications in epidemiological research, either for retrospective purposes or future research. Many cross-sectional survey series present different self-reported instruments, thus, intra-and intercomparability among these surveys would be improved with better PA prevalence surveillance and control, even in already existing surveys, reducing blank periods. Likewise, it should be mentioned that there are a great number of epidemiological studies which measure PA prevalence using this single brief question, and this study would reinforce case control and multiple cross-sectional studies. Lastly, further future research ought to be considered due to the scarce number of this type of validation study of brief questions, especially in different contexts and regions such as individuals with disabilities, or in developing countries.
In conclusion, the use of a single brief question could be a pragmatic alternative method to assess compliance with PA guidelines of activity, from those who reported practicing PA Few times a week or Almost daily, to inactive Never and Occasionally groups, according to poorer classification performance in the latter. This approach may be suitable in a clinical context as well as in epidemiological and public health research.

Conflicts of Interest:
The authors declare no conflicts of interest.