Measurement Invariance of the Flourishing Scale among a Large Sample of Canadian Adolescents

Our aim was to examine measurement invariance of the Flourishing Scale (FS)—a concise measure of psychological wellbeing—across two study samples and by population characteristics among Canadian adolescents. Data were retrieved from 74,501 Canadian secondary school students in Year 7 (2018–2019) of the COMPASS Study and from the original validation of the FS (n = 689). We assessed measurement invariance using a confirmatory factor analysis in which increasingly stringent equality constraints were specified for model parameters between the following groups: study sample (i.e., adolescents vs. adults), gender, grade, and ethno-racial identity. In all models, full measurement invariance of the FS across all sub-groups was demonstrated. Our findings support the validity of the FS for measuring psychological wellbeing among Canadian adolescents in secondary school. Observed differences in FS score among subgroups therefore represent true differences in wellbeing rather than artifacts of differential interpretation.


Introduction
In recent years, there has been increased focus on positive psychology within adolescent health literature. Rather than an emphasis on dysfunction and psychopathology, shifts toward positive psychology are commensurate with shifting dual continua models of mental health and illness [1,2] and newer theories of positive youth development [3][4][5]. Factors grounded in positive mental health and resilience-such as social and psychological wellbeing-have been identified as protective against the development of mental disorders (e.g., anxiety, depression) [6,7]. Flourishing, for example, is a state of overall wellbeing used to describe the presence of mental health [1]. Among adolescents, those who are flourishing are more likely to thrive academically [8] and may be less likely to engage in potentially harmful behaviours including bullying involvement [9], binge drinking [10], and cannabis use [11,12]. Importantly, recognizing adolescence as a developmental period often involving risk-taking and experimentation [13,14], individuals' overall sense of wellbeing may even buffer the mental health sequelae of such behaviours [11,15].
The secondary school environment presents an opportunity for interventions which aim to foster adolescents' psychological wellbeing and resilience. Large data systems, such as the COMPASS Study [16] in Canada, serve as valuable infrastructure by which such school-based interventions can be evaluated. However, this work requires robust methods of measuring (and meaningfully quantifying changes in) wellbeing among adolescents. More broadly in Canada, there is an expressed need for population-level indicators of positive mental health and wellbeing in surveillance research, and colleagues have placed effort on finding an appropriate measure [17]. While the measurement of concepts such as wellbeing is inherently challenging [18], several tools exist. The Scale of Positive and Negative Experience (SPANE) and Flourishing Scale (FS), for example, were developed by Diener and colleagues [19] in light of contemporary theories of subjective wellbeing. Whereas researchers have been traditionally concerned with either hedonic (i.e., positive feelings) or eudemonic (i.e., positive functioning) beliefs [20], more recent theorists have argued in favour of an integrative approach to conceptualizing wellbeing [21][22][23] rather than one in which hedonia and eudemonia are considered mutually exclusive. Definitions of flourishing have referred to a state in which hedonic and eudemonic wellbeing are simultaneously present [21,24].
One particular question in psychometric research relates to whether a given scale measures the actual construct it was intended to measure (e.g., psychological wellbeing), and whether measurement of the construct is interpreted consistently despite differences in respondents' characteristics. This property-referred to as measurement invariance-is a prerequisite for meaningfully comparing a measure between groups [46], thus limiting risk of bias as a function of differential interpretation [47]. To our knowledge, no previous studies have sought to test measurement invariance of the FS among adolescents in Canada. Relying on a large, school-based sample of Canadian students in the COMPASS Study [16], the primary objectives of our present study were twofold: first, to test measurement invariance of the FS between our study sample (i.e., adolescents) and Diener's original study sample (i.e., adults); second, to test measurement invariance of the FS among adolescents by sociodemographic subgroups. These included gender, grade (age), and ethno-racial identity given their relevance as determinants of health and wellbeing among adolescents. Our secondary objective sought to estimate differences in FS scores across adolescent subgroups (i.e., by gender, grade, and ethno-racial identity).

Design and Samples
The present study used student-level data from Year 7 (Y 7 (2018, 2019)) of the COMPASS Study-a large, prospective cohort study (2012-2021) of secondary school students in Canada [16]. Students self-reported various behavioural (e.g., physical activity, substance use) and mental (e.g., depression, anxiety) health indicators using the COMPASS student questionnaire (Cq), administered annually within participating schools during class time. In Y 7 , a total of 74,501 students across 136 schools (8 in Alberta, 15 in British Columbia, 51 in Ontario, 52 in Québec) completed the Cq. Data were collected using active-information, passive-consent data collection protocols [48] that have been approved by the University of Waterloo Office of Research Ethics and participating school boards. Further COMPASS Study details are available elsewhere in print [16] and online (www.compass.uwaterloo.ca).
Additional data were obtained from the original, multi-site validation study of Diener's FS [19]. The study sample included 689 adult participants recruited across six post-secondary institutions in the United States and Singapore. Using these data, the FS was shown to have good psychometric properties and strong convergent validity with existing measures of wellbeing [19]. Findings of the original FS validation are available in print [19].

Instrument
The FS [19] was included as a component of the Cq's mental health module (MH-M; [49,50]). The FS consists of the following statements: (1) I lead a purposeful and meaningful life, (2) my social relationships are supportive and rewarding, (3) I am engaged and interested in my daily activities, (4) I actively contribute to the happiness and wellbeing of others, (5) I am competent and capable in the activities that are important to me, (6) I am a good person and live a good life, (7) I am optimistic about my future, and (8) people respect me. In Diener's original FS [19], individuals are asked to rate their level of agreement to each statement using a 7-point Likert scale (1 = strongly disagree, 2 = disagree, 3 = slightly disagree, 4 = neither agree nor disagree, 5 = slightly agree, 6 = agree, 7 = strongly agree), where possible sum scores range from 8 to 56 and higher scores indicate greater psychological wellbeing. For the purpose of suitability in large-scale school data collections through COMPASS [49], the FS response options were modified on the Cq to a 5-point Likert scale (1 = strongly agree, 2 = agree, 3 = neither agree nor disagree, 4 = disagree, 5 = strongly disagree) yielding a score between 8 and 40, but where higher scores were indicative of poorer psychological wellbeing.
We reverse-coded the FS scoring in the Cq MH-M for consistency with Diener's original FS, and collapsed the strongly agree/agree and strongly disagree/agree Likert response options of the original FS for consistency with the Cq MH-M. As a result of these changes, composite FS scores collected from both samples ranged between 8 and 40 and higher scores indicated greater psychological wellbeing. Internal consistency of the FS was high within the adolescent COMPASS Study sample (α = 0.87) as well as in Diener's original sample (α = 0.87).
The Cq also allows students to self-report on various sociodemographic characteristics including gender ("male, female" in the Cq and hereafter referred to as "boy, girl"), grade (9,10,11,12, and "other," which included students in Québec enrolled in Secondaire I and II-equivalent to grades 7 and 8 in other provinces), as well as those enrolled in a secondary school class with no official grade equivalent (e.g., "new immigrant" classes in Québec). Note that there is no grade 12 in Québec. Age and weekly spending/saving money (CAD 0, 1-20, 20-100, 100+, don't know) were used as a proxy measure of student-level socioeconomic status and part-time employment. Students also reported their ethno-racial identity by selecting from one or more listed identities in the Cq. Using these responses, we re-categorized students as racialized (Black, Asian, Latin American, Indigenous, other, mixed) or non-racialized (white).

Statistical Analysis Strategies
Measurement invariance of the FS was investigated using a four-step procedure in which increasingly stringent equality constraints were specified for model parameters between groups (e.g., adolescents vs. adults; boys vs. girls) within a multiple-group confirmatory factor analysis. In the first step, the configural model imposed no equality constraints on parameters and was the origin for subsequent tests [51]. Configural invariance suggests that the same underlying factor structure is observed between comparison groups. Second, the metric model examined the extent to which the factor loadings for each item were equivalent between groups. Invariant factor loadings are a prerequisite for making valid group comparisons [52]. Third, the scalar model tested for evidence that item intercepts were equivalent [46], so as to verify whether mean differences at the item level are fully explained by mean differences at the factor level. In the final step, the strict model was specified to determine whether the variances of the regression equations for each item were equivalent across groups (i.e., item residuals). Strict invariance is required for defensible item-level comparisons [53]. This systematic approach to adding constraints allows the identification of specific parameters that contribute to model misfit and, ultimately, differences in the interpretation of the latent construct [47]. When model fit is adequate and change in fit indices negligible, equal factor loadings (i.e., metric invariance) suggest that groups attribute the same meaning to the construct; equal factor loadings and intercepts (i.e., scalar invariance) suggest that meaning of the items that comprise the construct is the same between groups; and equal factor loadings, intercepts, and residuals suggest that the explained variance is the same and the construct is measured identically.
We relied on two criteria to establish measurement invariance. The first required adequate model fit at each level of testing. We determined a priori that at least two fit indices (comparative fit index (CFI), square root mean residual (SRMR), or root mean standard error of approximation (RMSEA)) needed to meet established cut-points to declare adequate model fit [54][55][56]. The cut-points were CFI ≥ 0.950; SRMR ≤ 0.080 and RMSEA ≤ 0.080 [46]. The second specified that changes in fit indices (i.e., from the model with fewer equality constraints on parameters to the more constrained model) must not exceed established cut-points. We determined a priori that of ∆CFI, ∆SRMR, or ∆RMSEA scores, at least two needed to meet this criterion to establish measurement invariance at any given level of testing. The cut points for change in model fit indices were ∆CFI ≤ −0.010, ∆SRMR ≥ 0.030, or ∆RMSEA ≥ 0.015 [57]. Given that χ 2 goodness-of-fit and ∆χ 2 are highly influenced by sample size, we did not rely on them as indices of model fit.
If model fit was inadequate at the configural level, we reviewed modification indices to identify potential correlations between like items that could be specified to improve model fit. Where measurement invariance at a given level of testing was not established (i.e., substantial worsening of model fit), we reviewed modification indices and identified constraints on relevant non-invariant parameters that could be removed to improve model fit. We then tested the prespecified model against the less constrained model and computed change scores again. This approach, known as partial invariance, argues that only a subset of model equivalent parameters is needed for substantive analyses between groups [58].
Measurement invariance testing was conducted with robust standard errors [59,60] using Mplus version 6.11 (Muthén & Muthén, Los Angeles, CA, USA) [61]. Additional descriptive (t and one-way ANOVA tests) and parametric (mixed linear regression) analyses were conducted in SAS version 9.4 (SAS Institute Inc., Cary, NC, USA) [62]. Adjusted β-estimates (controlling for province and weekly spending/saving money) were reported alongside 95% confidence intervals. We calculated the intra-class correlation coefficient (ICC) explaining potential variation in students' FS score as a result of school-level clustering (ICC FS = 0.033). Although we detected a marginal amount of within-school variation, we proceeded to adjust for students' clustering by schools within all measurement invariance tests and regression modelling. We used the full information maximum likelihood function in Mplus and SAS (PROC MIXED) to preserve cases with missing data [63].

Sample Characteristics
Among COMPASS Y 7 participants (N = 74,501), the mean FS score was 32.19 (SD = 5.72). Students' sum scores ranged from 8 to 40, and the median FS score was 32; 41% (N = 28,553) of students reported FS scores below the mean. Half of students (50%) were girls, and 31% reported a racialized identity. Students' mean age was 15 years (SD: 1.5 years, range: 12 to 19 years) and roughly 81% were within grades 9-12. The majority of students were from Ontario (41%) or Québec (30%). Student characteristics are presented with mean FS score in Table 1; FS total and item-level means are further shown by subgroup in Appendix A. Refer to Table 2 for FS score norms (as percentile rankings) among students in our sample.

Measurement Invariance by Sample
Given that a total composite score is used for the FS, we proceeded directly to fitting the single-factor model in the confirmatory factor analysis. Fit of the one-factor model was good among the adolescent  (Table 3). As shown in Table 3, equality constraints placed on the factor loadings (metric model) did not substantially worsen model fit: ∆CFI = 0.000 and ∆RMSEA = −0.006. Similar results were found when constraining item intercepts (scalar model: ∆CFI = −0.006; ∆RMSEA = 0.000) and residuals (strict model: ∆CFI = −0.001; ∆RMSEA = −0.003).

Measurement Invariance by Gender, Grade, and Ethno-Racial Identity
Having established full measurement invariance between adolescents and adults, invariance across gender, grade, and ethno-racial identity was tested among the adolescent sample ( Table 4). The configural models (no equality constraints) demonstrated excellent fit to the data for gender

Flourishing Scale Scores Among COMPASS Y 7 Students
Results from a mixed linear regression model predicting a one-point increase in the FS, on average, are presented in Table 5 (Model I). Lower average FS scores were present among girls (β = −0.88, p < 0.0001) compared to boys, as well as students who reported a racialized identity (β = −0.83, p < 0.0001) compared to those who did not. Compared to students in grade 9, average FS scores were lower in grades 10 (β = −0.35, p < 0.0001), 11 (β = −0.54, p < 0.0001), and 12 (β = −0.63, p < 0.0001) but higher for those in the "other" grade category (β = 0.92, p < 0.0001). All estimates are adjusted for province and weekly spending money.

Discussion
Using recent data collected from a large sample of secondary school students enrolled in the COMPASS Study, our findings support the validity of the FS among the Canadian adolescent population. We demonstrated full measurement invariance between our sample and the adult sample among whom the FS was originally validated [19], thereby confirming that the FS in fact measures the same construct (i.e., psychological wellbeing) among adolescents. Additionally, we identified strict invariance in FS measurement across gender, grade, and ethno-racial identity. This finding confirms that statistical differences observed between student subgroups within the COMPASS Study represent true differences, rather than artifactual differences in interpretation of the FS. Given the importance of psychological wellbeing to adolescent health and strong psychometric properties demonstrated by the FS, future youth-focused surveillance research should strongly consider incorporating the FS as a measure of youth wellbeing.
While the FS has been previously validated across a number of populations and languages, our present work continues to fill important gaps within the existing literature. Our study is the first to test measurement invariance with the sample of participants used in the original development and validation of the FS. To our knowledge, the COMPASS Study also represents the largest sample in which FS validation has been investigated, followed by a study from colleagues in New Zealand who relied on a nationally representative sample of 10,009 adults [45]. With two exceptions, the majority of existing FS validation studies have focused solely on adult populations and/or university students; Singh, Junnarker, and Jaswal found good fit of a single-factor structure for the FS items among adolescents in India [64], as did Duan and Xie among 12-17-year-old adolescents in China [32]. Findings from these studies may not generalize to Canadian adolescents.
Upon establishing measurement invariance across gender, grade, and ethno-racial identity among students in the COMPASS Study, further extending the validity of the FS, we examined differences in FS scores across these subgroups. First, we identified lower levels of psychological wellbeing among girls than boys. This finding is consistent with existing research indicating relatively poor wellbeing among adolescent girls compared to boys [65][66][67] yet is inconsistent with findings from a smaller Canadian study in which female undergraduate students scored higher on the FS than males once validity was established [26]. Other findings have shown no differences by sex among youth [32]. While previous studies of the FS have demonstrated measurement invariance by sex and gender [27,32,37], few have gone on to test for the presence of differences that describe gendered experiences of wellbeing. Our current findings may reflect the ways in which girls are disproportionately impacted by socio-cultural norms and pressures during adolescence, experiences of which may influence their wellbeing [68].
We also found a pattern of decreasing psychological wellbeing with increasing secondary school grade. Consistent with other school-based and youth health literature, positive indicators of wellbeing generally appear to decrease with adolescent age toward young adulthood [65,66]-perhaps as a function of factors including perceived academic and social stress and pubertal and psychosocial development. As grade and age are highly correlated among students in the COMPASS Study, we chose to adopt grade in our analyses as a proxy for age to improve the interpretability of our findings for school-based knowledge users. Given measurement invariance of the FS by grade was established within our sample ranging in age from 12 to 19 years, we highlight an ability to detect age-related differences in psychological wellbeing among adolescents using the FS.
After confirming measurement invariance by ethno-racial identity among COMPASS Y 7 students, we found that racialized students reported lower average psychological wellbeing compared to non-racialized students. An existing body of knowledge recognizes racial and ethnic discrimination as linked to various health outcomes and disparities [69,70] as well as self-perceived and psychological wellbeing [71][72][73]. Here, we provide evidence of measurement invariance of the FS by ethno-racial identity. It is important to note that due to sample size restrictions, we were unable to assess wellbeing across racial or ethnic identity groups as self-reported by students. However, our findings continue to highlight the ways in which individuals' experiences of wellbeing may be differentially impacted by socio-political and systemic processes of racism, discrimination, and stigmatization-even among adolescents in Canada.

Strengths and Limitations
These findings are primarily strengthened by the nature of our data. The COMPASS Study represents the largest school-based sample of adolescents in Canada-data from whom are collected across several provinces (Alberta, British Columbia, Ontario, Québec) through a hierarchical design. Despite our inability to test convergent validity of the FS with other measures of wellbeing, we were uniquely able to use existing data originally collected by Diener and colleagues [19] to validate the FS; thus allowing us to establish measurement invariance of the FS across adolescents and adults. Analytically, our findings are further strengthened by (1) our use of full-information maximum likelihood to handle missing data, rather than relying on complete-case analysis and (2) our multi-level modelling approach to control for variability due to school-level clustering.
We note some limitations. First, these self-reported data are not nationally representative, and the generalizability of our findings to all adolescents in Canada is thus limited. However, the COMPASS Study relies on purposive sampling procedures that contribute to our large sample size, and use of active-information, passive-consent data collection protocols helps mitigate bias introduced by students' self-reporting [48,[74][75][76][77][78]. Second, while we assessed differences in wellbeing by gender and ethno-racial identity, we were not able to account for non-binary gender identities and heterogeneity among racialized groups. Moreover, we did not investigate interactions across sub-groups of gender, grade, and ethnicity/race; intersectional analyses are necessary to further understand differential experiences of wellbeing. Third, our findings are only relevant to secondary school-aged adolescents in Canada and not to younger students. Future studies are needed to investigate the applicability of the FS for measuring wellbeing among primary and elementary school-aged children.

Implications
These findings have practical implications for health researchers and practitioners interested in using the FS to measure wellbeing among adolescents. We have shown here that the FS can be used as a valid tool for assessing and monitoring the psychological wellbeing of Canadian adolescents, as it not only measures the construct as originally intended but does so consistently despite differences in students' gender, grade, and ethno-racial identity. Data collected using the FS can inform targeted interventions meant to promote the overall psychological wellbeing of adolescents within secondary school settings. Notably, our findings are situated within the secondary school context, thus highlighting the potential utility of findings in demonstrating the FS as an indicator of successful school-based interventions and programs. Through robust data systems such as the COMPASS Study, the impact of these interventions and policies (i.e., at the school-, provincial-level, etc.) can be evaluated in real time as natural experiments [16,79].

Conclusions
In summary, our findings demonstrate full measurement invariance of the FS between study samples, and across the adolescents' gender, grade, and ethno-racial identity. These findings further support the validity of the FS for measurement of psychological wellbeing among Canadian adolescents in the secondary school context. Further, this study provides evidence that efforts to improve psychological wellbeing should especially consider the needs of adolescent girls, those in older secondary school grades, and racialized students. Using existing data systems, such as the COMPASS Study, Canadian programs and interventions that target students' wellbeing can be evaluated as robust yet feasible natural experimental studies. Funding: The COMPASS study has been supported by a bridge grant from the CIHR Institute of Nutrition, Metabolism and Diabetes (INMD) through the "Obesity-Interventions to Prevent or Treat" priority funding awards (OOP-110788; awarded to SL), an operating grant from the CIHR Institute of Population and Public Health (IPPH) (MOP-114875; awarded to SL), a CIHR project grant (PJT-148562; awarded to SL), a CIHR bridge grant (PJT-149092; awarded to K.P./S.L.), a CIHR project grant (PJT-159693; awarded to K.P.), and by a research funding arrangement with Health Canada (#1617-HQ-000012; contract awarded to S.L.). The COMPASS-Quebec project additionally benefits from funding from the Ministère de la Santé et des Services sociaux of the province of Québec, and the Direction régionale de santé publique du CIUSSS de la Capitale-Nationale.

Conflicts of Interest:
The authors declare no conflict of interest.