Psychometric Properties of the Multidimensional Temperance Scale in Adolescents

Recent research has shown the relevance of measuring the virtue of temperance. The present study tested a multidimensional and second-order structure scale to assess temperance using a sub-scale of the Values in Action Inventory of Strengths for Youth (VIA-Youth). Scale properties were tested using data from a sample of 860 adolescents aged from 12 to 18 years old (M = 14.28 years, SD = 1.65). The sample was randomly split into two subsamples for model cross-validation. Using the first sample, we assessed scale dimensionality, measurement invariance, and discriminant and concurrent validity. A second sample was used for model cross-validation. Confirmatory factorial analysis confirmed the fit of one second-order factor temperance virtue model, with the dimensions of forgiveness, modesty, prudence, and self-control. The results indicate scale measurement equivalence across gender and stage of adolescence (early vs. middle). Latent means difference tests showed significant differences in forgiveness, modesty, and self-regulation by gender, and modesty according to adolescence stage. Moreover, the scale showed discriminant and concurrent validity. These findings indicate that this scale is helpful for assessing temperance in adolescents and suggest the value of temperance as a multidimensional and second-order construct.


Introduction
Virtues are central attributes that are highly appreciated in philosophy and religious theories worldwide, since they favor the optimal functioning of people [1,2]. Temperance, one of these identified virtues [3,4], contributes to a wide variety of positive consequences, such as individuals' well-being and the achievement of goals [5][6][7]. As a result, the interest among scholars in measuring this virtue has seen exponential growth in recent years [8][9][10][11].
Temperance involves regulating emotions, behavior, and motivation [2,12]. According to the literature [2,13], this virtue encompasses the strengths of modesty (avoiding flaunting and permitting personal accomplishments to speak for themselves), self-regulation (regulating behaviors and feelings), forgiveness (leaving aside anger or revenge towards the offender), and prudence (being cautious with individual decisions and avoiding actions one may regret). Some scholars have adopted the positive psychology approach [6,14] to research this virtue because the approach embraces the scientific study of positive human functioning and adaptive behaviors at all levels, such as personal, relational, and institutional [1,15].

Measures of Temperance
Temperance is recognized as a crucial trait related to adolescents' personal and academic positive outcomes [16][17][18][19]. The growing interest in studying virtues in adolescence  [20]. This measure, which has been widely used, includes a subscale of the virtue of temperance [21][22][23]; the subscale consists of four first-order factor measures that include forgiveness, modesty, prudence, and self-regulation [20]. However, research shows that the factorial structure of the scale is inconsistent. That is, some studies reported it as a three-factor scale [9,23] and others [20,21] reported it as a four-factor, five-factor [24], or even six-factor scale [19]. In addition, a study conducted by Van Eeden et al. [22] showed no clustering of the strengths, contradicting the theory. Second, the evidence for second-order models is scarce [21,23]. Furthermore, most studies have conducted exploratory factor analyses or principal component analysis [9,19,24], using total strength scores instead of the items of the scale. As a result, the factor weights of each item were not reported. Finally, studies conducted within the Mexican context are limited, and have only focused on the adult population [25,26].

Measurement Invariance
Although the empirical evidence is still inconclusive, the current literature suggests that temperance differs by gender and age. Some studies [21,[27][28][29][30] report higher scores of temperance in males, whereas others [19,[31][32][33] report higher levels in females. Similarly, findings regarding age are contradictory, whereas some studies indicate that temperance positively correlates with age [10,31,32,34] and others have found no association between these variables [35,36] and it has been recently found that temperance decreases in adolescence. However, these findings should be taken with caution since these studies did not report measurement invariance when examining group differences. Verifying measurement invariance results is necessary to make a meaningful comparison between group means and to warrant that group differences are associated with latent variables [37,38]. Therefore, it is essential to examine the measurement equivalence of the temperance scale by gender and stage of adolescence to realize meaningful comparisons by groups in temperance dimensions of self-control, forgiveness, prudence, and modesty.

The Present Study
The measurement of temperance has some potential weaknesses, such as (a) the dearth of studies that have examined the fit to the data of a second-order factor model; (b) no study known by the authors has examined the invariance of measurement according to gender and stage of adolescence, although prior research suggests that temperance may differ by gender and age [27,32]; (c) the studies evaluating the discriminant and concurrent validity of temperance are scarce; (d) there is no study known by the authors that has examined the psychometric properties of a multidimensional temperance scale in Mexican adolescents. To attend to these gaps, in this study we proposed: (1) examining the dimensionality of a second-order model that displays four first-order factors (see Figure 1; see Table 1); (2) examining scale measurement invariance by gender and adolescence stage (early vs. middle); (3) comparing latent variable mean differences across groups, if scale measurement invariance is confirmed; (4) assessing discriminant validity by analyzing the relationships between each subscale; and (5) examining concurrent validity by testing the correlations between the dimensions of the temperance scale and bullying aggression (proactive and reactive). trol) that fit the data. Hypothesis 2 (measurement invariance): the scale shows robust invariance across gender and adolescence stages. Hypothesis 3 (latent means): Studies are not conclusive, and no previous hypothesis about gender and stage of adolescence differences was considered. Hypothesis 4 (discriminant validity): Each subscale of the temperance scale discriminates between conceptually similar constructs. Hypothesis 5 (concurrent validity): the dimensions of the temperance scale have a negative relation with proactive and reactive bullying aggression.   To accomplish these purposes, we considered five hypotheses. Hypothesis 1 (internal structure): the indicators used to measure temperance reveal a second-order factor structure that contains four first-order factors (forgiveness, modesty, prudence, and self-control) that fit the data. Hypothesis 2 (measurement invariance): the scale shows robust invariance across gender and adolescence stages. Hypothesis 3 (latent means): Studies are not conclusive, and no previous hypothesis about gender and stage of adolescence differences was considered. Hypothesis 4 (discriminant validity): Each subscale of the temperance scale discriminates between conceptually similar constructs. Hypothesis 5 (concurrent validity): the dimensions of the temperance scale have a negative relation with proactive and reactive bullying aggression.

Participants
Participants were students from 32 public secondary and 32 high schools from three cities in Sonora, Mexico. These schools typically serve students of low and middle socioeconomic status. The study sample was composed of 860 adolescent students, 406 (47.2%) males and 454 (52.8%) females, whose ages ranged from 12 to 18 years old; 430 (50%) early adolescents (M age = 12.79 years, SD = 0.07) and 430 (50%) middle adolescents (M age = 16.58 years, SD = 0.06). The sample was randomly split into two subsamples for model calibration (n = 430) and cross-validation (n = 430).

Temperance
A subscale of temperance virtue (TV) of the Values in Action Inventory of Strengths for Youth [20] (VIA-Youth; Spanish version) was used; temperance is a virtue that encompasses strengths that focus on controlling excesses. The scale includes four dimensions: forgiveness, which involves leaving aside resentment or revenge and a benevolent feeling towards the offender (4 items, e.g., I am a forgiving person); modesty, which implies avoiding flaunting and permitting personal accomplishments to provide the necessary information about oneself (4 items, e.g., I never brag or flaunt my accomplishments); prudence, which includes being careful with personal decisions and avoiding speaking or behaving in a way that may be regretted (4 items, e.g., I think about the consequences of my behavior before I act.); and self-regulation, which involves the ability to regulate actions, emotions and resist temptations (4 items, e.g., I can control my anger quite well). Responses used a five-point Likert scale (0 = not like me at all to 4 = very much like me).

Procedure
First, the study received ethical clearance from the Ethical Research Committee from the Technological Institute of Sonora (Authorization number: PROFAPI_ 2020_0018). Then, we gained authorization from school authorities for conducting the study. In a virtual meeting organized by the teachers, we informed the students' parents about the research purpose. Then a consent letter was sent by email to parents to request their authorization for their children to respond to the questionnaires. Only 3% of parents rejected their children's participation. Once approvals were gained, students were invited to participate in the study voluntarily. Data collection was carried out through online surveys. The time estimated to respond to the survey was about 20 to 30 min.

Data Analysis
We verified that missing data (less than 5%) were completely random. We treated missing data using multiple imputation methods, accessible in SPSS 25 (IBM Corp., Armonk, NY, USA). Descriptive statistics were run on the items (means, standard deviations, skewness, and kurtosis). Then, an unconditional random effect model was calculated to examine the school dependency of temperance and bullying aggression. The results suggested that temperance differences (Wald z statistic = 1.68, p = 0.092; intraclass coefficient ICC = 0.04) and aggression (Wald z statistic = 1.24, p = 0.214; ICC = 0.05) differences were not dependent on school [61,62]. Confirmatory factorial analyses (CFA) were conducted using the Bollen-Stine and maximum likelihood bias-corrected confidence bootstrapping estimator (500 replicates with 95% CI) in AMOS 25 (IBM Corp., Armonk, NY, USA). These estimators were chosen as the Mardia coefficient value was 9.47, which suggests multivariate non-normality. Bootstrapping is a robust procedure for dealing with non-normality in multivariate data [63][64][65].

Dimensionality
In order assess the dimensionality of the temperance scale, we analyzed a first-order factor goodness-of-fit model (Model A). After establishing the four first-order measurement model's adjustment, we tested a model with these four factors as indicators of a secondorder temperance dimension to assess whether this first-order model could be conformed with the dimensions of one second-order factor model (Model B). In estimating the models' global goodness of fit, we used the X 2 statistic and associate probability, and Bollen-Stine bootstrap probability. Since X 2 and Bollen-Stine bootstrap are sensitive to large samples [66][67][68], the standardized root means square residual (SRMR), comparative fit index (CFI), Tucker-Lewis index (TLI), and root mean square error of approximation (RMSEA) with their confidence intervals were reported. Structural equation modeling (SEM) literature suggests that model fit is adequate when X 2 with p > 0.001; Bollen-Stine p < 0.05; CFI ≥ 0.95, and TLI ≥ 0.90. For the SRMR and RMSEA, a value ≤ 0.05 shows that the model fit is excellent, and a value ≤ 0.08 indicates an acceptable fit [38,69]. Differences in X 2 (∆X 2 ) and the Bayesian information criterion (∆BIC) were utilized to compare models. In cases where resulting differences in the X 2 (∆X 2 ) value are significant, the model with a lower X 2 has a better fit to the data [38,70]. Differences of BIC > 10 show distinctions in the model's fit to the data, and a model with greater BIC has a poorer fit [38,71,72].

Measurement Invariance
Nested models were tested according to the procedure suggested in the literature [76,77]. We tested the baseline model configural that considered a fixed number of factors in each group (configural invariance). When the baseline model fit each group, we tested the factors' loading invariance across groups (metric invariance). Once the metric invariance was verified, we evaluated the invariance-constrained measurement intercept (scalar invariance). Differences in X 2 with an associated p < 0.001 suggest the measurement model is equivalent across groups [38,77]. However, the ∆X 2 statistic is sensitive to sample sizes [77,78]; thus, scholars have advocated using goodness-of-fit indexes, such as differences in CFI (∆CFI) and differences in RMSEA (∆RMSEA). We followed the values proposed by scholars [77,79], who assert that differences greater than 0.01 in the CFI and 0.015 in the RMSEA exhibit a significant difference in model fit for the testing of invariance. In cases where the two procedures differ, we relied on the values of differences in CFI and RMSEA because of the larger sample used in this study [77][78][79]. If scalar invariance was confirmed, we calculated groups' latent mean differences. For this, the means for the reference group (male and early adolescents) were fixed. We used a z statistic to compare latent means [38,76].

Discriminant Validity
Discriminant validity confirms that the constructs are empirically unique [80,81]. Campbell [82] suggests that it ensures that a latent variable is "not correlated too highly with measures from which it is supposed to differ" (p. 6). Based on the literature, we assumed that discriminant invariance is confirmed when the average variance extracted (AVE) in each factor is greater than the square of this correlation with the other scale factors [81,83].

Concurrent Validity
Concurrent validity requires that the scale scores correlate in a hypothesized model with other constructs measured simultaneously [84]. To test concurrent validity, correlations to temperance dimensions with aggressive and proactive bullying aggression were calculated. Values of r greater than 0.10 indicate smaller effects, r values between 0.20 and 0.29 reveal a medium effect, and r values greater than 0.30 suggest a large effect [85].

Model Cross-Validation
We used a cross-validation method to test the replicability of the model dimensionality obtained in the calibration sample (n = 430) in an independent sample of adolescents (n = 430). A multigroup analysis was used to assess the model replicability in an independent sample. We compared the unconstrained model with a model that had factor loadings and fixed variances/covariances. Based on the SEM literature, we considered that factorial invariance was confirmed when ∆X 2 was not significant (p > 0.001), ∆CFI ≤ 0.01, and ∆RMSEA ≤ 0.05. The X 2 statistic is sensitive to a larger sample and non-normality departures, so we used ∆CFI and ∆RMSEA values when results were contradictory.

Descriptive Item Analysis
The collected responses suggested that adolescents exhibit a moderate level of temperance. Values' skewness and kurtosis indicated normal univariate distribution in all items (see Table 2).

Dimensionality
The initial four-first-order factor model (Model A) did not fit the data (see Table 3). Therefore, we improved the model's fit based on the analysis of factor loadings and modification indices. The literature suggests that the factor loading for an item should be 0.6 or higher [67,74,86] to be a salient factor. Based on this, item 1 ("I often stay mad at people even when they apologize"; standardized factor loading = 0.11), item 5 ("I am not a show-off"; standardized factor loading = 0.04), item 10 ("I often find myself doing things that I know I shouldn't be doing"; standardized factor loading = 0.42), and item 14 ("My temper often gets the best of me"; standardized factor loading = 0.17) were removed from the model. In addition, considering the modification indices (MI > 5) and the theoretical issues [38,74], we added three error covariances. Note: X 2 -chi-square; df -degrees of freedom; p-associated probability; SRMR-standardized root mean square residual; CFI-Comparative fit index; TLI-Tucker-Lewis index; RMSEA-Root mean square error of approximation; BIC-Bayesian Information Criterion.
These changes resulted in a significant improvement in the fit of this model (see Table 4). The goodness-of-fit suggests an acceptable fit of the four-first-order factors model (Model B). Then we compared the four-first-order factor models (Model B) with one secondorder model (Model C) that displayed four first-order factors. The adjustment to the data of one second-order factor model (Model C) was statically better than that of the four first-order factor model, ∆X 2 = 11.25, df = 2, p < 0.001; ∆BIC = 11.25. Therefore, based on theoretical and empirical findings, which suggest that temperance is a virtue that comprises several strengths, we chose Model C over the other choices and the described results are based on this model. Note: df -degree free; ∆χ 2 -difference in chi-square; ∆df -difference in degree free; ∆CFI-difference in comparative fit index; ∆RMSEA-difference in root mean square error of approximation.

Measurement Invariance by Stage of Adolescence
The baseline model fit to the data (configural invariance), X 2 = 225.16, df = 186, p = 0.026; Bollen-Stine bootstrapping p = 0.052; SRMR = 0.05; TLI = 0.98; CFI = 0.98; RMSEA = 0.024, 90% CI (0.009, 0.034), supporting the equivalence of the second-order factor structure of temperance across early and middle adolescent groups. Then, we assessed the metric invariance of all factor loadings (measure invariance). The model with the factor loadings constrained fit adequately to the data based on the criteria of the X 2 differences and changes in CFI and RMSEA values, ΔX 2 = 11.82, df = 12, p = 0.46; ΔCFI = 0.001; ΔRMSEA = 0.001), which suggests that the factor loadings are consistent across the stages of adolescence. Finally, we constrained the intercepts (scalar invariance) in the model comparison. Our findings suggested that there are no important group differences in the intercept, ΔX 2

Measurement Invariance by Stage of Adolescence
The baseline model fit to the data (configural invariance), X 2 = 225.16, df = 186, p = 0.026; Bollen-Stine bootstrapping p = 0.052; SRMR = 0.05; TLI = 0.98; CFI = 0.98; RMSEA = 0.024, 90% CI (0.009, 0.034), supporting the equivalence of the second-order factor structure of temperance across early and middle adolescent groups. Then, we assessed the metric invariance of all factor loadings (measure invariance). The model with the factor loadings constrained fit adequately to the data based on the criteria of the X 2 differences and changes in CFI and RMSEA values, ∆X 2 = 11.82, df = 12, p = 0.46; ∆CFI = 0.001; ∆RMSEA = 0.001), which suggests that the factor loadings are consistent across the stages of adolescence. Finally, we constrained the intercepts (scalar invariance) in the model comparison. Our findings suggested that there are no important group differences in the intercept, ∆X 2 = 44.52, df = 41, p = 0.327; ∆CFI = 0.002; ∆RMSEA = 0.002.
The goodness-of-fit statistic suggested that the measurement model was invariant across early and middle adolescent groups (see Table 4).

Latent Means Differences
To test latent means differences, we fit males' means to zero. The analysis revealed significant mean differences by gender on three of the first-order factors. Females had higher scores on forgiveness and modesty than males, but lower scores on self-regulation than males. The gender difference in prudence was not statistically significant.
Regarding latent means differences by adolescence stage, we chose early adolescents as the reference group and estimated the latent mean of the middle adolescent group. The test revealed that differences in forgiveness, prudence, and self-control were not statistically significant. However, the mean difference in modesty was statistically significant (see Table 5). Middle adolescents had a higher score on modesty than early adolescents.

Concurrent Validity
The dimensions of temperance correlated as expected with proactive and reactive aggression (see Table 6). As anticipated, all the factors of temperance had a negative correlation to proactive and reactive bullying aggression. The effect size of the correlation between modesty and proactive and reactive aggression was small (r > 0.10), and the values of all other correlations indicated a medium (r > 0.20) or large (r > 0.30) effect size. Overall, these results suggest that correlations between temperance dimensions and both types of aggression have theoretical and practical implications [83], confirming the Temperance Scale's concurrent validity.

Cross-Validation Analysis
We cross-validated the data to address problems associated with the replicability of the model. The model was tested on an independent sample. Multigroup invariance analysis provided evidence of configural (X 2 = 60.21, df = 48, p = 0.111; SRMR = 0.06; CFI = 0.96; TLI = 0.95; RMSEA = 0.05, 90% CI [0.03, 0.07]), metric, and scalar invariance (see Table 7). This evidence allowed us to conclude that the measurement model is replicable in both samples.

Discussion
We analyzed the psychometric properties of one second-order multidimensional model of Temperance of VIA-Youth, according to Park and Peterson's [20] conceptualization. Given the gaps in the construct measurement, this study can add to the field, particularly in terms of temperance assessment. Overall, our results showed that the adjustment to a single second-order measurement model fit the data better and demonstrated its replicability through cross-validation. Moreover, the results supported measurement invariance, indicating that the measurement model is equivalent by gender and adolescence stage. For deeply understanding the underpinning differences around temperance, this characteristic of the scale is crucial. Finally, we confirmed the discriminant and concurrent scale validity.

Temperance as a Second-Order Factor
The results confirmed our second-order structure hypothesis, which comprises four first-order factors: forgiveness, prudence, modesty, and self-regulation. Furthermore, after comparing the first-order and second-order models, we found evidence suggesting that the second-order model fits better to the data. These findings are aligned with previous research [21,23], indicating that temperance has a second-order structure that emerges from its four strengths. Considering this, subsequent investigations should analyze the foundations and outcomes of temperance considering its four dimensions.

Measurement Invariance by Gender and Adolescence Stage
Our findings support the measurement equivalence of the Temperance Scale by gender and stage of adolescence. These results indicate that the scale items may be utilized to measure this construct in both genders and in early vs. middle adolescents. Therefore, unlike previous scales, this scale allows researchers to compare genders and stages of adolescence more fairly and meaningfully.
Latent mean differences indicate that females scored higher in forgiveness and modesty than males. These results are in alignment with previous research [29,33]. Furthermore, similarly to other studies [31,32], we found that males showed higher self-control than females. Data did not show differences in forgiveness, prudence, and self-control regarding the adolescence stage. These findings are also congruent with past studies [32,35] that have found no relation between temperance and age. However, our findings reveal that middle adolescents scored higher in modesty than early adolescents. This evidence is consistent with that of Brown et al. [32], who found higher levels of modesty in older adolescents. Regardless of the present results, further studies should continue exploring gender and age differences to clarify the underpinnings of these discrepancies and their implications on adolescent development.

Discriminant Validity
The results prove that each temperance subscale assesses a different scale dimension, which supports discriminant validity. In line with previous research, study results indicate that temperance dimensions evaluate a different strength [20,21]. Our study provides empirical and theoretical evidence of the multidimensionality of temperance. Further studies need to examine the variables associated with first-order dimensions of temperance and its consequences in relation to adolescents' psycho-emotional development on each dimension of temperance.

Concurrent Validity
In addition, the data provide evidence in favor of concurrent validity. In line with prior research [45,52,87], these results showed significant and negative associations between traits that conformed to temperance virtues and proactive and reactive aggression. Moreover, these correlation effect sizes suggest practical implications. Overall, these results indicate that temperance and its strengths may be important variables to consider for preventing peer aggression.

Theoretical and Practical Implications
The results of this study suggest that theory about virtues and character strengths is a generative framework to study positive behavior. Furthermore, our findings confirm that virtues conform to strengths that influence moral behavior [2,20]. Specifically, the study evinced that temperance is a second-order factor that displays first-order factor measures: forgiveness, modesty, prudence, and self-regulation. Similarly to other studies [21,23], these findings confirmed this factor structure. The study confirms the value of the original classification of character strengths in the VIA. This instrument will allow us to analyze the possible positive results of temperance and explore the threshold effects and the possible exponential effects of combining two or more strengths [88]. In addition, our findings suggest that strengths that conformed to temperance virtues are essential for reducing peer aggression and should contribute to the comprehension of the underpinning factors of bullying. In this regard, temperance strengths are crucial for protecting people from excesses and encouraging positive social relations and adaptive behaviors [2], which could help to decrease peer aggression.
From a practical perspective, the present study highlights the value of a scale with robust psychometric properties to measure temperance in adolescents. The accurate measurement of temperance is critical for practitioners and schools in order to enhance adolescents' strengths rather than their weaknesses, thereby improving their mental health and fostering positive development [88]. Furthermore, latent means differences could support the development of differentiated tools to increase these strengths at different stages of adolescence and by gender, offering the opportunity to direct more appropriate strategies to encourage adolescents to engage in this virtue. Overall, robust theoretical and psychometrically temperance measures allow researchers to generate relevant findings regarding the antecedents and consequences of temperance in adolescence.

Limitations
Although this study provides a helpful scale for researchers, some limitations must be considered. First, data collection was carried out through self-reports; therefore, the students' responses could be influenced by social desirability [89]. Second, our sample consisted of adolescents from northwestern Mexico; therefore, a more diverse sample is desirable to generalize the results, recognizing that student responses may differ according to the country or region. Third, cross-cultural studies are essential to assess the replicability of the measurement model in a culturally diverse population. Forth, longitudinal designs are necessary to assess the extent to which temperance changes in childhood and adolescence across time and in terms of its relationships with bullying aggression.

Conclusions
The present research sheds light on the current understanding of temperance as a virtue and the strengths that comprise it. Our findings confirmed the value of the theoretical scheme of temperance [20] as a multifactorial second-order construct. Given the importance of having appropriate measures for evaluating constructs in positive psychology, this scale provides a robust psychometric instrument for assessing temperance in adolescents. We believe that this virtue is crucial for the positive development of youth. Therefore, we consider that future studies should explain the means through which temperance is built in school and family environments.
Additionally, our study provides a valuable instrument with evidence of robust validity for the evaluation of temperance as a multidimensional construct. The above allows better understanding to assess each of these strengths in particular, as well as helping to promote them in school interventions with adolescents. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.