Reliability, Validity, and Gender Invariance of the Exercise Benefits/Barriers Scale: An Emerging Evidence for a More Concise Research Tool

The Exercise Benefits/Barriers Scale (EBBS) research instrument has been extensively used to investigate the perceived benefits and barriers of exercise in a range of settings. In order to examine theoretical contentions and translate the findings, it is imperative to implement measurement tools that operationalize the constructs in an accurate and reliable way. The original validation of the EBBS proposed a nine-factor structure for the research tool, examined the EBBS factor structure, and suggested that various factors are important for the testing of the perception of exercise benefits and barriers, whereas a few items and factors may not be vital. The current study conducted a confirmatory factor analysis (CFA) using hierarchical testing in 565 participants from the northwest region of the United Kingdom, the results of which provided evidence for a four-factor structure of the benefits measure, with the Comparative Fit Index (CFI) = 0.943, Tucker–Lewis Index (TLI) = 0.933, and root means square error of approximation (RMSEA) = 0.051, namely life enhancement, physical performance, psychological outlook, and social interaction, as well as a two-factor structure of the barrier measures, with the CFI = 0.953, TLI = 0.931, and RMSEA = 0.063, including exercise milieu and time expenditure. Our findings showed that for a six-factor correlated model, the CFI = 0.930, TLI = 0.919, and RMSEA = 0.046. The multi-group CFA provided support for gender invariance. The results indicated that after three decades of the original validation of the EBBS, many of the core factors and items are still relevant for the assessment of higher-order factors; however, the 26-item concise tool proposed in the current study displays a better parsimony in comparison with the original 43-item questionnaire. Overall, the current study provides support for a reliable, cross-culturally valid EBBS within the UK adult population, however, it proposes a shorter and more concise version compared with the original tool, and gives direction for future research to focus on the content validity for assessing the perception of the barriers to physical activity.


Introduction
The health benefits of physical activity (PA), especially for the prevention of noncommunicable diseases, and the recommendations for minimum PA to secure such health benefits have been well-documented [1]. Despite this, approximately 3.2 million deaths per year have been attributed to a lack of engagement in PA, and this has established physical inactivity as the fourth universal risk factor of mortality [1]. Thus, it is imperative to understand the barriers and facilitators of engagement in PA. Although obtaining such an understanding does not necessarily lead to enhancement of engagement in PA, such understanding is essential for providing more in-depth knowledge about the adherence to PA guidelines, the etiology of physical inactivity, and developing policies and strategies for facilitation of PA behavior [2].
Amongst several theories aiming to explain human health behavior, the Health Belief Model is one of the most widely-used applied theories [3], postulating that risk susceptibil-ity, risk severity, self-efficacy, and cues to action, together with perceived benefits of action and perceived barriers of action, are the six key constructs predicting health behavior [4]. Among these constructs, perceived barriers and benefits are of particular importance, as it was previously demonstrated that the ratio of perceived barriers to perceived benefits is the strong predictor of the individual's health [5] and PA [6] behavior. Previous studies in older adults demonstrated that the perception of the PA benefits can take intrinsic orientation represented in perspectives such as health promotion, self-confidence, and well-being; or extrinsic orientation via improving social interactions, receiving financial rewards, or attendance in a convenient environment [7]. Similarly, the perception of barriers of the PA is equally as important as actual or imagined barriers, inconveniences, predicaments, financial constraints, and the range of expenses linked with adherence and childcare, together with the perceived risk of safety and security have been previously reported as barriers adversely affecting PA behavior [8].
Previous studies have identified that, amongst adults, physical limitations due to pain and weakness, a lack of motivation, and a lack of time are the strongest perceived barriers to PA; whereas family relationships, social support, and potential health benefits are the strongest motivators [9]. In young adults, where external barriers and health education demonstrated a strong association with PA behavior [10], a study demonstrated that the availability of places to exercise and the convenience of the exercise schedule, together with physical exertion (hard work, tiredness, and fatigue) from exercise, were the strongest perceived PA barriers [11]. Amongst older adults, the review of Schutzer and Graves (2004) classified poor health, availability of a physical environment, a lack of advice from the physician, and the lack of knowledge and understanding of the relationship between PA and health, together with the pattern of insufficient PA during childhood and youth, as the established constraints to PA, while an individual's belief in their ability to successfully perform (i.e., self-efficacy), continuous contacts and prompts, and environmental factors, such as appropriate music, were classified as the motivators to exercise [12]. Despite these broad themes, the determinant of PA behavior and perceptions of barriers and facilitators of such behavior are population-specific, based on biological, psychological, social, and cultural characteristics of the population [2], and thus, there is a need for a validated tool that can identify and examine the perceived barriers and facilitators of PA in different populations. Table 1 demonstrates a summary of the literature on the perception of the benefits and barriers of physical activity with a methodology and population comparable to the current study.
In the UK, El Ansari and Lovell [13] and Lovell, El Ansari, and Parker [6] investigated the barriers to exercise between non-exercising younger and older female adults using the barriers subscale of the Exercise Benefits/Barriers Scale (EBBS) in relation with family relationships and responsibilities; however, a relatively small sample size, the focus on barriers (and, in particular, in relation with the number of children), inadequate attention to confounding factors, narrow age and gender categories of the participants, and, most importantly, the descriptive analysis of the findings limited the generalizability and usefulness of the findings.
The EBBS research instrument, as employed by El-Ansari and Lovell [13], has been extensively used to investigate the perceived benefits and barriers of exercise in a range of settings, populations, and conditions, for instance in Mexican-American women [14], midlife Australian women [15], older African women [16], Iranian women [17], in physical disabilities and chronic health conditions [18], in patients with multiple sclerosis [19] or with HIV [20], in overweight and obese women with polycystic ovarian syndrome and as part of a randomized controlled trial [21], as part of the investigation into physiological and perceptual responses to Latin partnered social dance [22], in relation with cardiac rehabilitation [23,24], in parents and preschool age children [25,26], and as part of the investigation of the perception of a yoga-based fall prevention program in older adults [27]. The above studies have implemented the EBBS as part of the research, making reference to the original validation study by Sechrist, Walker, and Pender in 1987 [28].
The original validation of the EBBS, conducted by Sechrist and colleagues [28] in the USA, used a large sample of 664 adults (mean age = 38.7 years) and exploratory factor analysis (EFA) with varimax rotation to assess the factor structure. The results provided support for a 43-item and nine-factor solution, with five factors composing the benefits subscale and four factors the barriers subscale. Although no item cross-loadings were found between benefit and barrier subscales or within the barrier subscale, a number of items cross-loaded within the benefits subscale. The item factor loadings for all factors were acceptable, and ranged between 0.46 and 0.85. Factor loadings for first-order factors varied between 0.56 and 0.77. The nine-factor solution explained 64.9% of the variance. In addition, Cronbach's alpha provided support for the reliability of the EBBS.
Following the validation study by Sechrest and colleagues [28], the EBBS nine-factor structure has been assessed on various occasions. In contrast to the original validation study, the subsequent factor analytical results only partially confirmed the original structure. The EFA assessment of Akbari Kamrani et al. (2014) of the EBBS accounted for 61.8% of the variance [29]. Although the amount of variance explained was similar to the Sechrest et al. study [28], in a sample with 388 older participants (above 60 years of age), Akbari Kamrani et al. (2014) found a 10-factor solution, and two items showed crossloadings with other factors, yielding five benefit and five barrier factors [29]. Similarly, within the study of Brown (2005), including 398 psychology students, the use of EFA also yielded a 10-factor solution [30]. Interpretation of the factor structure was based on a factor loading cutoff of 0.45 and factors with less than three items; for instance, family discouragement with two items was omitted. Overall, a total of 17 items did not load on any factor, with loading below 0.45. The final seven-factor solution supported five benefit factors and two barrier factors, explaining 38.1% of the variance [30]. Using a sample of 409 nursing students, EFA assessment of the EBBS conducted by Ortabag et al. [31] also supported a seven-factor structure, with five benefit and two barrier factors, accounting for 57.1% of the variance. In sum, the original 43-item, nine-factor structure as proposed by Sechrest et al. [28] was only partially supported by follow-up studies. Thus, based on EFA results, it appears that the EBBS might be effective with fewer items and fewer factors. Benefit factors that have been confirmed repeatedly include physical performance, psychological outlook, social interaction, and preventive health, whereas barrier factors were reflected by physical exertion and exercise milieu [29][30][31].
The inconsistent findings discussed demonstrate a gap in the knowledge, and the fact that, despite the frequent use of EFA and substantial support for part of the EBBS factor structure, a methodological shortcoming in the psychometric assessment is the lack of rigorous confirmatory factor analysis of the EBBS.
The present study conducts a confirmatory factor analysis as part of its study of validity, and within the domain of construct validity. This conceptually refers to the extent that the scores of the EBBS are consistent with the hypotheses that they were generated for (and based on the presumption that the EBBS indeed validly measures the construct to be measured). The fundamental perspectives within this domain of validity are the structural validity, hypotheses validity, and cross-cultural validity [32]. Through incorporating CFA as part of our broad analysis, the study not only attempts to lessen the overall number of observations into latent factors based on the commonalities observed in the data, but also (and more importantly) to minimize measurement errors and facilitate inferential analysis and comparison of the alternatively postulated a priori model [33]. CFA therefore offers an opportunity to statistically examine the structure of groups within our target population, facilitate the investigation of the cross-cultural validity, and give adequate confidence for confirmation of the validity of the instrument and its sub-domains, or potentially developing an amended version of the research tool, if necessary. The overall purpose of the current study was to examine the EBBS reliability, validity, and gender invariance. To meet this broad goal, we produced three aims linked with elements of the study.
Reliability tests of Cronbach's alpha for the global EBBS and EBBS subscales have been previously tested [28][29][30][31], showing values above the 0.70 cutoff [39]. The first aim of the study was to re-test the reliability of the EBBS and underlying factors for our target population (Aim 1). Previous studies testing the EBBS factor structure [28,30,31] indicated that various factors are essential for the testing of exercise benefits and barriers, whereas a number of items and factors are not vital. In contrast to previous EFA approaches, we intended to assess the latent factor structure of the EBBS using CFA (Aim 2). An important aspect when testing exercise benefits and barriers is the potential moderating effect of gender. A methodological shortcoming of previous research was the testing of gender differences based on mean differences, such as t-tests [29]. A thorough investigation of gender differences would be the assessment of gender invariance based on covariance matrices. Therefore, we intended to examine the potential gender differences between exercise factors and item understanding using multi-group confirmatory analysis (Aim 3).

Participants
A total of 565 males (n = 244) and females (n = 321) between 15 and 41 years of age (M = 21.19, SD = 2.42) participated in this study. Data were collected from university students from Merseyside, UK. Most participants reported being single (n = 454; in a relationship, n = 101) at the time of data collection. For more information about the study population and the recruitment procedure, please see our earlier publications [40,41].

Instrument
The Exercise Benefits/Barriers Scale (EBBS; Sechrist et al., (1987)) was developed to assess people's perceptions on exercise benefits and barriers to exercise. Sechrest et al. (1987) developed the EBBS with nine factors and 43 items, 29 items assessing exercise benefits and 14 items exercise barriers [28]. The EFA at the time revealed five factors underlying the benefits of exercise, which were labelled life enhancement (seven items in total; item example: "Exercise improves the quality of my work"), physical performance (nine items in total; item example: "Exercise increases my level of physical fitness"), psychological outlook (six items in total; item example: "I enjoy exercise"), social interaction (four items in total; item example: "Exercising is a good way for me to meet new people"), and preventive health (three items in total; item example: "I will live longer if I exercise"). Four factors depict barriers to exercise, termed as exercise milieu (six items in total; item example: "It costs too much to exercise"), time expenditure (three items in total; item example: "Exercising takes too much of my time"), physical exertion (three items in total; item example: "Exercise is hard work for me"), and family discouragement (two items in total; item example: "My family members do not encourage me to exercise"). The response scale is a four-point Likert-type scale anchored by 1 (strongly agree) and 4 (strongly disagree).

Procedures
Following approval by the university's ethics committee, the study was promoted across the universities in Merseyside, UK. The current investigation is part of a large cross-sectional study, where participants were recruited and attended two clinical visits for physical and physiological assessments in the first visit, and demographic and questionnaire completion within the second visit. Standard consent procedures were followed before questionnaire administration. Participants filled in the demographic questionnaire and the EBBS, together with other questionnaires on eating disorders, stress, depression, and anxiety. They completed the hard copy of the EBBS and demographic questionnaire battery within 10 min, and the data were anonymously transferred to Microsoft Excel spreadsheets, and were examined for accuracy and potential data entry errors before the statistical analysis.

Statistical Analysis
Confirmatory factor analysis (CFA) was undertaken using AMOS 23.0 software. Model estimations were based on maximum likelihood methods [42]. In order to establish and assess multi-factor measurement models, we adopted a strategy by Jöreskog (1993), who proposed a model-generating stage before testing the measurement model [43]. First, we assessed one-factor congeneric models for each of the nine EBBS factors. Congeneric models are the smallest entity of a measurement model outlining the regression of a number of observed variables (items) on a single latent variable (factor). This allows re-specification of the model (i.e., freeing covariance, item deletion) before calculating the overall measurement model for exercise benefits and exercise barriers (Aim 1). We then examined the overall fit of the entire EBBS scale (Aim 2). The measurement models were assessed through parsimonious fit, and we employed a chi-square test (χ 2 ); the incremental fit was examined through the Comparative Fit Index (CFI) [44] and the Tucker-Lewis Index (TLI) [45], and the absolute fit was analyzed through the standardized root mean square residual (SRMR) [46] and the root means square error of approximation (RMSEA) [47]. Based on the standards developed by Hu and Bentler [48], a very good fit of the data is achieved when CFI and TLI scores exceed 0.95, the SRMR is below 0.08, and the RMSEA is below 0.06. Once the best fitting model was established, we used measurement invariance techniques, as proposed by Gregorich (2006), Byrne (2004;2010), and Cheung and Rensvold (2002) [49][50][51][52], to assess the covariance structure of the main models across gender (Aim 3).

Reliability
Cronbach's alpha showed adequate values for the EBBS (α = 0.82) and the benefits (α = 0.83) and barrier (α = 0.81) subscales. Most subscales showed adequate Cronbach alpha values above the 0.70 threshold [39]. As the benefit subscales ranged between 0.71 and 0.85, the barrier subscales showed some values below the acceptable limit. While exercise milieu (α = 0.73) and time expenditure (α = 0.70) were just above the cutoff value, the internal consistency of the physical exertion factor was marginally below (α = 0.62), and family discouragement showed a particularly low score of 0.47. Given the problematic factor construction that was based on only two items, which may have contributed to the low alpha scores, it was decided to omit the family discouragement factor, excluding items 21 and 33 from further analysis.

Correlations
In Table 2, the correlations matrix revealed good support for the two main constructs of exercise benefits and barriers. All benefit factors showed strong, positive correlations with the benefit measure, ranging between 0.66 and 0.84. The correlation coefficients between the barrier factors and barriers measures also displayed strong and positive links between 0.60 and 0.80. The relationship between benefits and barriers was reflected by a significant, negative coefficient of 0.39.

Convergent Validity
For the exercise benefit higher-order factor, all items showed adequate factor loadings between 0.49 and 0.79. Only one item of the life enhancement factor (item 29, "Exercise helps me decrease fatigue") showed a slightly lower loading of 0.39. For the barrier factor, items generally varied in factor loadings between 0.46 and 0.77. Item 28 ("I think people in exercise clothes look funny") revealed a loading of 0.35. Except for physical performance, the congeneric models showed a very good fit of the data in relation to CFI values of 0.95 and above. The TLIs were more variable, with only two factors showing an excellent fit, that is, life enhancement and time expenditure.
Almost all RMSEA values were slightly elevated, i.e., above 0.08 [48]. Life enhancement and exercise milieu showed adequate scores, including the 90% RMSEA, showing promising confidence intervals. Even though all regression weights were significant and fit indices showed some support of the factors' convergent validity, there were some scores that appeared rather high, and could cause issues in further model testing. Therefore, we examined all models with elevated RMSEAs above 0.08 more closely. The preventive health model revealed a poor absolute fit index, with an RMSEA of 0.133 and a confidence interval ranging between 0.070 and 0.209 (χ 2 (1) = 10.971, p < 0.001; CFI = 0.970; TLI = 0.910; SRMR = 0.035). The RMSEA is a moderately sensitive simple model misspecification, which can occur in congeneric model testing, but the RMSEA is more sensitive to complex model misspecifications [46]. Following up the elevated RMSEA scores, we used the Lagrange Multiplier test for standardized residual covariances to examine model misspecifications.
Misspecifications occurred for all three items with strong cross-loadings, i.e., above 10, and error terms strongly covaried, ranging from 14 to 32. Acceptable limits for covariances would be lower than 4 [50]. Therefore, we decided to delete items 5 ("I will present heart attacks by exercising"), 13 ("Exercising will keep me from having high blood pressure"), and 27 ("I will live longer").
The RMSEA (0.090) for physical performance was also inflated. The Lagrange Multiplier test detected higher scores for the regression weights and error covariances for items 23 ("Exercise improves my flexibility"), 31 ("My physical endurance is improved by exercising"), and 32 ("Exercising improves my self-concept"). Following the deletion of these three items, the remaining six-item factor of physical performance showed adequate scores at this level of model testing (Table 3). Testing psychological outlook, the six-item congeneric model also revealed an elevated RMSEA of 0.106, indicating model misspecifications. The Lagrange Multiplier test for standardized residual covariances indicated cross-loadings above 10 between items 8 and 20. In addition, the error terms for both items substantially co-varied, with M.I. = 25.221. We deleted item 20 ("I have improved feelings of well-being from exercise") and re-ran the model presented in Table 3. The absolute fit of social interaction was marginally elevated, with an RMSEA of 0.088. Modification indices suggested slightly increased error terms between 4.0 and 5.5. In this instance, we decided to retain all four items of the social interaction factor.
With regard to barriers to exercise, life enhancement showed adequate fit indices, indicating that the data fit the model well. Time expenditure showed a slightly increased RM-SEA, but, on closer inspection, all items and covariances indicated appropriate scores. Physical exertion, however, showed some serious model misspecifications, with χ 2 (1) = 36.908, p < 0.001; CFI = 0.831; TLI = 0.494; SRMR = 0.103; RMSEA = 0.252, confidence interval 0.187-0.325. Within the three-item factor, items 19 ("I am fatigued by exercise") and 40 ("Exercise is hard work for me") loaded on each other, with M.I. = 33.013 and M.I. = 32.272, respectively, and both error terms co-varied substantially, with M.I. = 31.670. Therefore, we deleted both items from further analysis. Two out of three items measuring physical exertion showed severe misspecifications, and therefore, we omitted this factor from further analysis.

Hierarchical Confirmatory Factor Analysis
When examining multi-group invariance, Byrne [50,51] proposed a testing strategy that involves a number of hierarchical steps. Firstly, a baseline model needs to be established for each sample. Byrne (2010) suggested that testing should be guided by the best fit of the data and model parsimony [50]. In this section, we tested a number of measurement models to establish the best fit. The best-fitting models for the benefit, barrier, or overall EBBS scale were used in further analysis to examine gender differences. Based on the result of the congeneric model testing, we examined a four-factor exercise benefit and three-factor exercise barrier model.
Testing the benefit scale first, the results of the CFA showed a poor fit of the data for the single-factor model, the second-order model, and the four-factor uncorrelated model ( Table 3). The four-factor correlated model (M4) showed a slightly better fit, with χ 2 (164) = 586.433, p < 0.001; CFI = 0.914; TLI = 0.903; SRMR = 0.049; and RMSEA = 0.058. The correlated model largely reflected the original five-factor model as proposed by Sechrist et al. [28], and showed the best fit of the data, which was therefore used for further analyses (Table 4). A number of items in the four-factor solution appeared to be problematic. Crossloading items and modification indices for regression weights with scores above 10 and covariances above 4 (Lagrange Multiplier test for standardized residual covariances) indicated poor fit of three items. Two life enhancement items and one psychological outlook item were detected as causing misspecifications. Examining the barrier scale, the results of the CFA showed a poor fit for the singlefactor and the uncorrelated two-factor models, whereas a better fit was found for the second-order model and the two-factor correlated model ( Hu and Bentler (1999) [48], almost reached excellent fit (M7).
Examining the barrier scale, the results of the CFA showed a poor fit for the single-factor and the uncorrelated two-factor model, whereas better fit was found for the second-order model and the two-factor correlated model ( In addition, two error covariances have been found with overlapping content, i.e., items 2 ("Exercise decreases feelings of stress and tension for me") and 3 ("Exercise improves my mental health"). Hierarchical CFA testing allowed us to re-specify the benefit, barrier, and EBBS measurement models. With regard to the second aim, it appears that the benefit and barrier models should be preferred for further testing, as fit indices showed stronger values for the four-and two-factor models. Items that were retained following the hierarchical CFA and factor loadings are presented in Tables 5 and 6.

Gender Invariance
For multi-group invariance testing, Gregorich (2006) proposed a hierarchical approach to the testing procedure, including the comparison of three models assessing metric invariance (factor loadings constrained to be invariant), strong invariance (factor loadings and item intercepts constrained to be invariant), and strict invariance (factor loadings, item intercepts, and item variance) against a configural model that is freely estimated with no constraints [49]. Support for metric invariance would indicate whether factors have the same meaning across national samples, strong invariance indicates whether comparisons between latent mean differences will be meaningful, and strict invariance takes into consideration item residual invariance, providing an even stronger basis for cross-cultural comparisons [49].
In Table 7, the results between configural and metric models showed acceptable values for all group comparisons, with ∆CFI < 0.01 [52]. Gregorich (2006) stated that this comparison is particularly important, as it indicates whether items have the meaning between groups [49]. Testing the barriers model, the results indicated that all items had meaning across gender groups (M2).  Kline (2005) proposed that items which are fixed to 1.0 cannot be examined for invariance. Therefore, these items were freed, and the latent parent variables were fixed to 1.0. NS = not significant.
Testing for strong invariance by constraining factor loadings and item intercepts of the model, in accordance with Cheung and Rensvold's (2002) suggestions [52], ∆CFI < 0.01, the results showed strong support for the model comparison between male and female data, with ∆CFI = 0.008. Gregorich (2006) proposed that the equivalence of factor item loadings and item intercepts is a precondition for testing latent mean differences [49], as applied by Vlachopoulos et al. (2013) [53]. The current results indicated that it would be appropriate to proceed with this type of analysis to compare the mean differences between males and females in their perception of exercise barriers. Finally, strict invariance was not evident for the two gender models, with ∆CFI = 0.012. The test of mean difference showed no significant results.
In Table 8, the results of exercise benefits between configural and metric models also showed acceptable values for all group comparisons, with ∆CFI < 0.01 [52], although the results of M2 indicated a significant difference between the configural and metric models, with p < 0.05. A shortcoming of this technique is that it only allows examination by comparison of all factor loadings across groups, but does not provide pinpoint accuracy when it comes to comparing items for specific subscales across groups. Although Cheung and Rensvold (2002) stated that the measurement model is completely invariant to the configural model when ∆CFI is less than 0.01 [52], Byrne (2010) considered ∆χ 2 and ∆df in conjunction with statistical significance as more stringent [50]. More importantly, in order to show that non-invariance testing proceeds on a subscale level, this way, the results allow to pinpoint which items are invariant and non-invariant across groups [51]. With regard to the data, the comparison between data from each gender did not show substantial differences on a factor-loading level, with ∆CFI = 0.003, indicating that factor loadings were invariant across gender groups, whereas the p-value, p < 0.05, indicated a significant difference between the models. This may suggest that specific items have different meanings, or are understood differently between males and females.
Following the test of the metric model, which had all factor loadings constrained, Byrne (2010) suggested to constrain the items of the first latent variable, in this case, items of the barriers factor equal across groups, and to measure the remaining items freely for each sample [50]. Therefore, when "factor-loading parameters are found to be invariant across groups, their specified equality constraints are maintained, cumulatively, throughout the remainder of the invariance-testing process" (p. 223). This procedure is shown in Table 8 (see Models 3 to 6). When items for each subscale were constrained separately, all subscale items for life enhancement, physical performance, and social interaction performed well and supported invariance across groups. For the psychological outlook factors, one item had to be estimated freely ("I enjoy exercise"), indicating significant differences in item meaning between groups. Model M7 showed no substantial differences regarding ∆CFI = 0.001. The following test of mean differences was not significant.  Kline (2005) proposed that items which are fixed to 1.0 cannot be examined for invariance. Therefore, these items were freed and the latent parent variables were fixed to 1.0. LE = life enhancement; PP = physical performance; PO = psychological outlook; SI = social interaction; NS = not significant. Item 1, "I enjoy exercise", of the psychological outlook factor had to be estimated freely.

Discussion
The overall purpose of the current study was to investigate the reliability and validity of the EBBS questionnaire and to examine whether the EBBS is invariantly perceived across gender. To the best of our knowledge, this is the first and most detailed and extensive cross-gender investigation of the reliability and validity of the EBBS that has uniquely and systematically conducted a CFA in addition to the previous tests and examined the perceived gender invariance of the EBBS factors and items. Our findings partly confirm the results of the previous EFA of the EBBS, which identified a core of factors and items within the EBBS that are relevant and essential to the validity and reliability of the questionnaire across populations.
The first aim of the study was to examine the reliability of the EBBS on a factor and global level. The results showed strong evidence for the internal consistency for global measures of benefits and barriers, which supports research findings by Sechrest et al. [28]. Some Cronbach alpha scores were below the desirable 0.70 cutoff [39], namely physical exertion and family discouragement. The two-item factor of family discouragement showed a particularly low alpha value of 0.46, and was omitted from further analysis. Previous research also indicated variability in alpha scores, as reported by Akbari Kamrani et al. [29], for psychological outlook (0.58), exercise milieu (0.65), and time expenditure (0.60). These findings bear similarity with previous studies that also found reliability issues on a subscale level (e.g., Brown [30]; Ortabag et al. [31]).
The second aim of this study was to investigate the latent factor structure of the EBBS. Following rigorous assessments on an item level, i.e., congeneric model testing and firstand second-order level, i.e., hierarchical CFA testing, the results provided evidence for a four-factor structure of the benefits measure, namely life enhancement, physical performance, psychological outlook, and social interaction, and a two-factor structure of the barriers measures, including exercise milieu and time expenditure. The current findings largely support the initial EBBS nine-factor structure proposed by Sechrest and colleagues [28]. Following the validation stage, Akbari Kamrani et al. [29] and Ortabag et al. [31] extracted five of the six factors found in this study through exploratory factor analyses. Based on the evidence provided in this and previous studies, there appears to be a core of factors consistently supporting the measurement of exercise barriers and benefits.
Factors that have not been supported in the present and previous studies might lack relevance, i.e., preventive health, physical exertion, or psychometric soundness, i.e., family discouragement. For instance, family discouragement showed a weak alpha value of 0.46 (Table 2), and consisted of only two items, making it prone to estimation problems [54]. Kline (2005) proposed that any measurement factor should consist of at least three indicators [54], and this suggestion has been taken into consideration by Brown [30], who omitted family discouragement due to this violation. Furthermore, we found no support for the physical exertion and preventive health factors at the congeneric model testing level, that is, all three items of each factor showed severe misspecification, and were omitted from further analysis. This was due to a particularly high RMSEA, which is sensitive to the number of items tested and favors model parsimony [55]. The results showed that the chosen item estimates did not fit the covariance matrix [56]. Preventive health did not emerge as a relevant factor in previous research (e.g., Akbari Kamrani et al. [29]), as well as time expenditure (e.g., Brown, [30]; Ortabag et al. [31]). It could be argued that preventive health is a rather broad factor, and participants, particularly younger age groups, such as student samples, may not be concerned with health issues, such as high blood pressure or heart attacks, that may or may not occur in the distant future.
The correlation between global measures of benefits and barriers was significant at r = −0.39. This finding is similar to Brown [30], with r = −0.46. On a subscale level, we generally found moderate to strong correlations ( Table 2). Brown [30] highlighted that measures of benefits and barriers bear importance for the EBBS, but he advised against the separation of both measures. In the current study, the hierarchical model testing confirmed sound psychometric properties and excellent indices for the correlated fourfactor benefits model, the correlated two-factor barriers model, and the correlated sixfactor model ( Table 4). These results indicate that future research could use measures of exercise barriers and exercise benefits either separately or in conjunction, depending on the specific aims of the study.
The original EBBS by Sechrest et al. [28] incorporated 43 items, whereas the current 26 items strongly supported the six-factor measure. The current model indicates parsimony, and the factor loadings for the retained items (Tables 5 and 6) showed convergent validity, with loadings generally above 0.45. Tabachnick and Fidell [57] argued that item loadings below 0.45 can cause problems in factor structure interpretation, as they would explain less than 20% of the variance. One item of the life enhancement factor ("Exercise helps me decrease fatigue") showed a loading of 0.35. Despite the recommendations by Tabachnick and Fidell [57], we decided to retain this item because of the overall performance of the life enhancement factor was strong across congeneric and hierarchical model testing.
The third aim of the study was to examine the potential gender differences between exercise factors and item understanding using multi-group confirmatory analysis. This type of assessment is relevant to make appropriate cross-cultural group comparisons [52]. When all factor loadings for the benefits and barriers subscales were constrained to be equal, the results showed no significant differences between male and female participants. Out of the 26 items, only one item, "I enjoy exercise" (life enhancement factor), had to be measured freely, indicating that male and females have a different understanding of what exercise enjoyment means. Although some support for metric invariance has been found, strong invariance at an intercept level could not be confirmed, and no latent mean differences were found. The results generally support the notion that the EBBS, more specifically the six-factor version tested in this study, is equally valid to use with male and female participants.
The results of this study add the following information to the extant literature. The psychometric properties of the amended, parsimonious EBBS are sound, providing support for its reliability and validity. Methodologically, the current study goes beyond previous EFA testing, and incorporates a rigid systematic protocol for testing the latent factor structure of the EBBS, consisting of first-order factor analysis (i.e., congeneric models), second-order factor analysis (i.e., CFA of the benefits and barriers model, and for the full EBBS model), and multi-group CFA to examine gender differences in the use and understanding of the EBBS. The use of this protocol was based on contentions by key references on model testing, including Byrne [50], Cheung and Rensvold [52], Gregorich [49], and Hu and Bentler [46,48]. Previous research on EBBS model testing did not follow such a systematic test protocol, which appears to be a methodological shortcoming. Future research testing the psychometric properties of the EBBS should follow clear guidelines on the testing procedures.
The current study was conducted within the framework of a large cross-sectional study called the Collaborative Investigation in Nutritional Status of Young Adults (CINSYA), which recruited a large sample of participants via convenient sampling, mainly from young adults studying in higher education institutions in Merseyside, England [40,41]. Ideally, the investigation of the reliability, validity, and gender invariance as seen within the current study would require a large but heterogeneous sample to be representative of the broad target population, and we feel that the rather narrow age range (mostly 18-25 years) represented in the current study may limit the generalizability of our results. Therefore, we believe that our findings should be considered in view of the above limitation. A methodological weakness that should be addressed is the use of calibration and validation samples [43]. Ideally, two independent samples should be used, including a calibration sample, to generate an initial structure and establish its factor structure, followed by the assessment of a validation sample to confirm this initial structure [43]. In this study, the latent factor structure was based on Sechrest et al. [28], who incorporated a large sample of 664 participants in their EFA test procedure. Researchers should consider the advantages of a stronger methodological approach, such as the inclusion of two samples. This approach could be used in cross-cultural studies trying to establish the cross-cultural validity of the EBBS. This would be helpful, as previous research has been conducted with samples from Iran (Akbari Kamrani et al. [29]), Turkey (Ortabag et al. [31]), and China (Guo [58]), but so far only Guo [58] implemented EFA and CFA analyses, although s/he did not use two independent samples.
The focus of the current study was on the reliability, validity, and gender invariance, and not the actual investigation of the benefits and barriers of PA amongst the target population. Despite this, we decided to collate a summary of the previous studies across the world to facilitate the comparison of our subscale means of the perceived benefits and barriers of PA against the previous studies with comparable methodologies and populations (Table 9). While most mean perceived benefit subscales seem to be generally in line with the mean subscales reported elsewhere, the mean perceived barriers of PA (e.g., in family discouragement, exercise milieu, and time expenditure) within the current study are not consistent with the other scores previously reported. Such cross-cultural inconsistencies in the perceived subscales are not unexpected, however, we consider these in view of our findings, and argue that these inconsistencies might partially be the reflection of the rigor of a useful research instrument that would need further investigation, revision, and cross-cultural validation.

Conclusions
The psychometric properties of the EBBS provide support for a reliable and valid operationalization of exercise benefits and exercise barriers. The results indicated that, in comparison with the original validation of the EBBS by Sechrest and colleagues [28], a number of core factors and items are still relevant for the assessment of higher-order factors. The current 26-item version in contrast to the Sechrest et al.'s [28] 43-item version displays a better parsimony, although, in agreement with Brown [30], the number of barrier factors could be increased to avoid under-powering of future assessments.
The current study contributes to the literature in operational and conceptual levels. While previous factor analysis including the studies of Akbari Kamrani et al. [29], Brown [30], and Sechrist et al. [28] used exploratory factor analysis, the current study replicated the factor structure based on the previous evidence through a statistically rigorous approach using congeneric model testing and confirmatory factor analysis, including hierarchical models and gender invariance testing. Our findings indicated strong support for the scale's internal consistency. In contrast to the previous studies, proposing a 10-factor structure of the EBBS, our CFA-based findings could only confirm a 6-factor structure containing two factors associated with the perception of barriers, and four to perception of benefits. The remaining four factors revealed substantial issues on an operational level, and future studies need to wary when or if using the factors in questions. If all EBBS factors are to be applied in the future use of the instrument, the rigor of the study can be improved by conducting additional factor analysis as part of the study, which may potentially lead to omitting the factors that are underperforming on a measurement level. Our study is an exemplar in which not all items performed well due to limited reliability and cross-loading and lack of gender invariance. Researchers using EBBS cross-culturally need to be aware of these shortcomings in preparation, design, and implementation of their studies.
Future research needs to validate additional content areas of exercise barriers, as the current and previous findings (e.g., Brown [30]) suggested an imbalance between benefits, i.e., four or five factors, and barriers, i.e., two factors. Albeit the results of the present and previous studies (e.g., Akbari Kamrani et al. [29]; Brown. [30]; Ortabag et al. [31]) indicated a number of core barriers, namely exercise milieu and time expenditure, and additional barriers that were formed in previous EFAs. The results of this study also underlined the value of two global categories (exercise benefits and exercise barriers) for future research as relevant constructs that could be used on its own or in conjunction.
Author Contributions: F.A. designed the study and led the application for ethical approval, participant recruitment, and operation and data collection of the project. S.K. formulated the research questions, led the statistical analysis and interpretation of the data, and wrote the initial draft of the manuscript. F.A. revised the manuscript critically for intellectual content. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.

Institutional Review Board Statement:
The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the institutional ethics committee.