Learning Self-Regulation Questionnaire (SRQ-L): Psychometric and Measurement Invariance Evidence in Peruvian Undergraduate Students

: Given the theoretical and applied importance of self-regulation in learning, our study aimed to report the internal structure of the psychometric properties of the Learning Self-Regulation Questionnaire. Five hundred and ninety-six Peruvian university students participated in their ﬁrst to tenth semesters on campuses in Lima, Trujillo, and Cajamarca. Nonparametric scalability, dimensionality, reliability (score and item levels), and latent invariance were analyzed. The results showed that reducing the number of response options was necessary. Reducing the number of items also produced better scaling. Two slightly related dimensions were strong internal validity and acceptable item reliability; furthermore, reliability was adequate. Age and gender had trivial correlations in item variability. Finally, differences between the semesters were obtained in the means, variances, and latent correlations. In conclusion, we propose a better deﬁnition of the constructs of autonomy and control measured by the SRQ-L. This article also discusses the limitations and implications of the study.


Introduction
Learning is omnipresent throughout life, and self-regulation is particularly relevant in learning and performance in various age groups, for example, adults [1], university students [2,3], adolescents in secondary education [4,5], and children in elementary education [6,7]. The relevance is because self-regulation modulates cognitive, affective, and behavioral facets to achieve the desired success level [8]. Therefore, self-regulation is one of the relevant variables in understanding the student's academic performance and adaptation. The people who self-regulate their learning have a greater capacity to select and structure the content they must learn [2]. They can adapt teaching strategies to perform better [9][10][11], reflect, participate with initiative, engagement [12,13], perseverance, and undetected bias or parameters that violate the measurement invariance of the SRQ-L. Since this property of the internal structure has not been verified before, it needs to be resolved.
Another aspect is the reliability of the differences between scales and the identification of abnormal differences, both psychometric parameters that help describe the discrepancies between the scores while controlling the measurement error [32,33]. This type of information is practical because it is exposed in the metric of the observed score [32] and directly influences the interpretation of the score to describe the student's position in the measured attribute and the study of the differences between both measured constructs (control and autonomy). Compared to the estimation of reliability coefficients, the information on the error variability in the observed score metric has practical and nonacademic use.
Regarding reliability, studies usually report internal consistency coefficients, specifically the alpha coefficient; its magnitude varies, typically from 0.60 but less than 0.90 on the two subscales [19,24,27,28,[34][35][36][37]. More appropriate internal consistency estimates may be required because the alpha coefficient requires several assumptions to be met, such as the absence of correlated errors and the tau-equivalence relationship between the items and their construct [38]. The fulfillment of these conditions avoids overestimating or underestimating the reliability through the alpha coefficient.
The objective of the present study is to contribute to a more rigorous evaluation of the construct validity through a validated Self-Regulation of Learning Questionnaire (SRQ-L) by considering two aspects: the internal structure and the measurement invariance, overcoming the gaps from previous studies [29], and thus obtain higher metric quality during its application in university learning contexts. This metric quality can be defined by the validity of the items, the dimensional structure, the invariability of its parameters, and the functioning of the current response scaling. These properties were not previously investigated or were performed with outdated analysis strategies, so this situation suggests a source of potential inconsistency with future studies from an updated view of the internal structure of the SRQ-L. Due to the importance of adapting and using tests in education, this instrument in its Spanish version (SRQ-L) demonstrated solid psychometric properties, with the methodological strength necessary to interpret the results in the desired way. It is a functional, valid, reliable, and culturally relevant scale for higher education students, replicating these findings in other contexts and populations.

Participants
Five hundred and ninety-six evaluated university students from the cities of Trujillo (320, 54%), Lima (29, 29%), and Cajamarca (101, 17%), Peru, were selected through nonprobabilistic convenience sampling [39], related to the researchers' access to professional career directors and the facilities provided by teachers for access to classrooms. Participants excluded from the study were the ones who had studied a previous professional career, were not present at the time of the survey administration, did not sign the informed consent, and did not complete the survey due to personal choice.
51% of the participants were male, 88% were single, and 72% did not work. The ages fluctuated between 16 and 40 years (M = 19.85; SD = 2.82). Their socioeconomic level was medium to medium-high. Of the participants, 77% were between the first-and third-semester studies, 22% between the fourth and sixth semesters, and 1% between the seventh and tenth semesters. They were enrolled in the professional careers of Engineering (39%), Administration (24%), Accounting (20%), Law (14%), Psychology (2%), and Communications (1%). To compare the participants per their study semester, we categorized three groups: from the 1st to the 2nd semester (141, 24%), the 3rd semester (320, 54%), and the 4th to the 10th (134, 22%). All students in the study signed the informed consent as a condition for completing the survey.

Instrument
Self-Regulatory Learning Questionnaire (SRQ-L) [24]. It is a self-report test consisting of 14 ordinal, 5-point items (from 1 = Not true at all to 5 = Completely true for me). It was translated and adapted to the Peruvian context by Matos [27], the version used in this study. Its structure comprised two factors: Autonomy (6 items) and Control (8 items). The first refers to the importance assigned to learning due to internal regulation and is based on intrinsic motivation; the basic psychological needs of competency, autonomy, and relationship were satisfied without searching for external stimuli. The second was oriented toward seeking rewards, external approval, or punishment avoidance [23,24].

Procedure
Design. The study was cross-sectional and instrumental [40,41] and employed a quantitative methodology.
Data Collection. Authorization was obtained from the directors of each university, and we solicited the support of the Psychological Orientation area to evaluate the students by psychologists and psychology interns trained for this purpose in an estimated 15 min. The test was applied in the middle of the academic semester, coordinating with the teachers to facilitate classroom access. The instruments were applied in groups of approximately 35 students, to whom the evaluation objectives were explained, and participation was voluntary, following the ethical principles of psychologists [42,43]. These instructions also communicated the anonymity of the response, the support in case of questions about the survey content, the absence of participation incentives, and the freedom not to continue filling out the survey. The applied material consisted of informed consent and the study instrument.
Analysis. The quantitative analysis focused on carefully examining (a) the univariate characteristics of the items, (b) the internal structure, (c) and the reliability. The differences according to variables that could explain the variability in self-regulation were identified using the latent variables within the methodological framework of the analysis of structural invariance. The analysis' general strategy was to apply several approaches to reduce the dependence of the conclusions on a single analytical procedure [44].
Item analysis. As the content of the items referred to two control attributes as individual expressions [45], their involvement could generate possible response patterns associated with the functional distribution of the response categories [45,46]. To examine it, we estimated thresholds for the items, which are points of intersection between adjacent response categories associated with the probability of choosing between one or another conditional category at the level of the latent attribute (or construct). The ordering (or disordering) of these thresholds would indicate the good (or bad) performance of the response categories [46]. It was estimated within the partial credit model [47,48], derived from the Rasch modeling for polytomous items [49]. For this estimation, the eRm program [50] was used within the IANA graphical interface [51] in the R program [52]. Additionally, the correlation between sex and age was examined as a partial expression of the content validity of the items [53,54].
Nonparametric analysis. A nonparametric approach [55] was applied to the ordinal items of the SRQ-L [56] to verify several fundamental and precursor properties [57] of the instrument scores but with independence from the strong assumptions of latent variable models. Three essential characteristics [58] were explored for the completion of the monotone homogeneity model (MHM): scalability (using the coefficient H), local independence (responses to the items are not mutually influenced; examined through three conditional association indices, W (1) , W (2) , and W (3) [59]), and monotonicity (incremental function between the item and the latent attribute; evaluated by comparing the number of current and expected violations to the monotonic model [55]). The procedure was performed using the Mokken program [52,60]. SEM analysis. The evaluation of dimensionality was complemented by the confirmatory factor analysis for categorical data, using the weighted least square mean and variance adjusted estimator (WLSMV) [61]; fit was assessed using approximate fit indices: CFI (≥0.95), TLI (≥0.95), RMSEA (≤0.05), Gamma-hat (G-h) [62], and SRMR (≤0.05). Regarding the measurement invariance, the appropriate sequence of steps was implemented for categorical variables [63], starting with successfully implementing restrictions on the parameters of the items. We started with configurational invariance, then introduced the cumulative constraint of equality of thresholds, factor loadings, intercepts, and, finally, residuals. Since identifying the adjustment in the measurement invariance is still a matter of debate [64], together with the CFI, the centrality index (Mc) [65] was also used, given its statistical robustness [64]. After corroborating the measurement invariance, the latent differences of the means were evaluated and estimated using d [66]; also, we assessed the differences of the variances through the standardized variance heterogeneity index, SVH [67], and the correlations using the standardized index q [66]. All SEM analyses were performed with the lavaan program [52,68].
Reliability. Reliability was estimated at the item and score level for each subscale. Regarding the items, it was calculated using the coefficient corrected by attenuation [69], given its lower bias and computational ease [70]; the minimum acceptable value is around 0.30 [71]. At the scoring level: (a) consistent with the nonparametric model, the MS coefficient was estimated [56], and (b) consistent with linear SEM modeling, the coefficient ω was estimated [72]. For comparison purposes, the coefficient α was obtained.
Practical indices of the measurement error of both scores were also estimated, using the standard error of measurement (SEM) and the standard error of measurement of the difference (SEM D ) [33]. For the latter, the formula SEM 2 F 1 + SEM 2 F2 was used, where SEM F1 and SEM F2 correspond to the SEM of each compared score. To obtain the critical values, the SEM D was multiplied by the z values derived from the standardized normal curve (1.43, 1.64, 1.96, and 2.57) and consistent with the 0.15, 0.10, 0.05, and 0.01 levels, two-tailed (respectively). Finally, to estimate the statistical abnormality of the difference between the F 1 and F 2 scores obtained by a subject from a clinicometric approach, the standard deviation of the difference was calculated [73]: SD D = SD 2 − 2r xy , where SD is the standard deviation in the standardized score metric (for this study, t-score, that is, M = 50, SD = 10), and r XY is the correlation between both compared scores (usually Pearson's linear correlation).

Item Analysis
Results are presented in Table 1. The means of the F 1 (Autonomy) items were generally higher than those of F 2 (Control), indicating a degree of independence of the behaviors in both factors. In contrast, an apparent more significant variability of behaviors was found in F 2 (observed from the standard deviation). On the other hand, the distributional non-normality was moderately variable (skewness and kurtosis). The correlations between the items of the scale F 1 with F 2 tended to be lower than the inter-item correlations within the same scale, suggesting clear divergent and convergent relationships (respectively). For the correlations of the items with sex and age, the Type I error was controlled by adjusting the nominal alpha by the Bonferroni method (in F 1 : 0.05/6 items = 0.008; in F 2 : 0.05/8 items = 0.006), finding that none of the correlations with the sex and age of the participants was statistically significant. Although no cut-off points were established to indicate the magnitude of the item-criterion correlations, the magnitude of the correlations obtained could be considered between the trivial and low levels. The examination of the ordering of the response options (see Figure 1) showed a pattern of disorder, strongly concentrated in options 1 and 2, and 4 and 5; a tendency to less differentiation between thresholds 3 and 4 was also observed. Because these observations (disordering and less differentiation) decrease the appropriate interpretation of the scale scores [46], category 1 was merged with 2, and category 3 merged with 4 and 5. The new scaling with three response options effectively maintained the ordering of the thresholds, except in Items 7 differentiation) decrease the appropriate interpretation of the scale scores [46], category 1 was merged with 2, and category 3 merged with 4 and 5. The new scaling with three response options effectively maintained the ordering of the thresholds, except in Items 7 and 14 of the F2 scale (Control). Without making other modifications, these items were kept for the following analyses to evaluate their effect on parameterization with factor analysis.

Nonparametric Dimensional Analysis
Scalability. The result of the Mokken modeling appears in Table 2. The scalability coefficients (H) were obtained in the total and three-semester samples. The decision to remove the items was based on (a) the pattern of comparatively low H coefficients (<0.30) [74], (b) their comparatively high standard errors, and (c) the moderate invariance of the two previous patterns in the compared groups (three levels of academic semesters). Therefore, Items 12 and 11 of F 1 (Autonomy) and Items 4, 5, and 13 of F 2 (Control) showed the lowest scalability, considering that the lower limit of their confidence interval (in 95%) was below 0.30. The final evaluation result of the items appears in the last column of the scalability analysis in Table 2.  Monotonicity. With the reduced version of the SRQ-L, no violation (#vi) was found in the monotonicity ratio greater than two between the items or that was statistically significant (#zsig), and the CRIT statistic for each item was less than 40 [75]. Overall, the results indicate that the monotonic homogeneity model is satisfactory.
Local independence. Finally, the local independence of the reduced version, evaluated with the indices W (1) , W (2) , and W (3) [59], indicated that the items of F 1 (Autonomy) do not contain associations between significant items in magnitude (W (1) between 0.111 and 0.937; W (2) between 2.426 and 3.605; W (3) between 0.015 and 1.893). In F 2 (Control), there was substantial inconsistency between the indices (concerning Item 2): W (1) between 0.009 and 0.618; W (2) between 3.322 and 6.714; W (3) between 0.004 and 2.355. Therefore, the final detection of local independence was evaluated by linear SEM modeling as a confirmatory option [76].

Invariance
Measurement invariance. After verifying the internally most acceptable model, we evaluated the measurement invariance. Table 4 presents the estimates and adjustment differences. The groups chosen for this objective were study semesters carried out by the participants, grouped into three categories: Group 1 or initial (the first two semesters), Group 2 or intermediate (3rd semester), and Group 3 or advanced (the 4th to the 10th semesters). Age and gender were not included in this analysis because they showed correlations around zero with the SQR-L items. The first stage of the invariance was verifying the factorial structure (configurational invariance), where the results were acceptable, although with some slight variation in some indices (Mc and RMSEA). Although the standardized factor loadings of the model in each group were not reported, the magnitudes of the differences [79] were in the trivial (≤0.10) or small (≥0.1) range, and none in moderate magnitudes (≥0.2) or large (≥0.3). The invariance of the thresholds was also satisfactory, and its difference with the configurational model was slight. The magnitude of minor differences was maintained in comparing the rest of the evaluated models, even reaching the invariance of the residuals. At this level of strict invariance, the fit indices were uniformly satisfactory, except for Mc. Structural invariance. After ensuring the invariance of the measurement, Table 5 shows the results of the structural comparisons between groups of the latent parameters (means, variances, and correlations). For the comparison of g1 vs. g2 and g3 groups, the reference group was g1; for the comparison between g2 and g3, the reference group was g3. Regarding the latent means, the standardized differences detected that in the Autonomy construct (F 1 ), the differences between groups were close to zero and statistically insignificant, except that the students of the g3 group (fourth to tenth semesters) scored slightly higher (between 0.30 and 0.50) [66] than students in the third semester (group g2). On the other hand, in the Control construct (F 2 ), the group from the third semester (g2) showed lower scores compared to the students from the first two semesters (g1), but the size of the difference was negligible. The group of advanced semesters (g3) scored more than Group g2 (3rd semester), of small magnitude.
Regarding the degree of individual differences (latent variances) in the compared semesters, students from more advanced semesters (g3, 4th to 10th semester) in Autonomy (F 1 ), and students from 3rd semester (g2) in Control (F 2 ), showed more variability. However, the comparisons with the SVH index were small (<0.56) [67], comparing groups within each factor and between factors. Finally, in the evaluation of the heterogeneity of the correlations [80], it was found that the relationships between the Autonomy and Control factors are different between the three groups (Q = 6.880, gl = 2, p < 0.05); the magnitude of this difference, I 2 [81], can be considered strong (I 2 = 90.92%). As shown in Table 5, the covariation was higher in Group g3 than in the rest.

Discussion
The purpose of the study was to evaluate the internal structure of the psychometric properties of the SRQ-L [24] in a group of Peruvian university students. The motivation was to explore its psychometric functioning in greater depth, given that the trend of previous studies in moderately similar groups [29] did not seem to identify some of its properties and failed to evaluate other important ones, specifically, measurement invariance, response scaling performance, divergent item-factor relationships, and correlated errors.
The analyses related to dimensionality showed results that seem to optimize the validity of the short version, given that the content relationships of the SRQ-L were maximized by selecting those items with better scaling and discriminative power; that is, they better differentiated the subject's position on the score continuum. The selected items, as a whole, helped to define better the score derived from them, through more discriminative, representative items and with better psychometric evidence. In the reduced version, the Autonomy construct now includes behaviors of following suggestions to optimize learning and actively participating in class to understand and improve skills. In Control, this is represented by behaviors to avoid the disapproval of others, actively participating in class, and following suggestions to achieve a good grade and have a good image in the eyes of others. Two of the items in this smaller version (Items 7 and 14 of the Control scale) showed a significantly different scaling compared to the rest of the items because the distance between their thresholds was minimal, and one of them still showed threshold disorder. However, these items maintained sufficiently high inter-item correlations, high scalability, and a pattern of statistical indicators indistinguishable from the rest of the items. Therefore, they were chosen for the final version of the SQR-L. Because previous studies did not examine this, we are unsure if this result comes from a sampling variation, representing a consistent feature of these two items. At this point, one implication is that further examination is required on a new sample to characterize this particular result.
The instrument's content derived from our analyses reduced the behaviors sampled from the original version (six items in the Autonomy factor and eight in the Control factor), which may limit the validity of the initially designed construct. However, the statistical fit of this full version using nonparametric scaling was not satisfactory; this is usually related to multidimensionality or excess variance irrelevant to the measured construct. The correlated errors between the items detected by both procedures (nonparametric and SEM) confirmed that the structure of the original SRQ-L contained inter-item relationships irrelevant to the construct measured and even that some items could be better related to the other factor. Furthermore, these problems strongly influenced the correlation between the constructs because the high correlation (>0.50) initially estimated between them was drastically reduced in the shortened version (zero).
Regarding reliability, the estimates showed acceptable magnitudes to describe the construct in a group of participants. However, they are possibly less precise if the abbreviated SRQ-L describes an intervention's individual effects. It is accepted that higher levels of reliability are required, such as >0.89. Interestingly, the brevity does not appear to have seriously affected the reliability of the scores, and the small number of items did not bias toward high reliability. This information may suggest that the content is not redundant or repetitive and still displays moderately different behaviors linked to its constructs. On the other hand, the clinical value indicators reported in Table 3 are helpful for research but especially for professional practice [32,33]. Multiplied with the standardized values of the normal curve, the statistically infrequent (abnormal) differences show an extreme range, especially at the 95% and 99% levels. Applied use may require choosing the 90% level to reasonably determine abnormal differences or the 85% level to reduce Type II error.
Regarding the study's implications, the reduced version can be easily incorporated into screening evaluations during some study semester periods, allowing space for other instruments. On the other hand, it can also be very efficient in measuring the change in self-regulatory skills since it is sensitive to interventions in the educational context. Indeed, some experimental studies have shown that self-regulation skills can be strengthened [4,82]. The development of a reduced version has known advantages for evaluation practice. For example, it reduces examinee fatigue when a set of measures is applied at a fixed time or in a longitudinal design. It reduces the number of estimable parameters when a short measure is used in structural equation modeling; sampled behaviors maintain stronger statistical covariations, and replicability is more guaranteed. Although the original version is not excessively long, a parsimonious measure with good psychometric properties is possibly better accepted. A final implication of the study is that since the instrument is used in various regions worldwide, our results provide working hypotheses on internal structure and group differences.
The reduced version also achieved invariance of its main psychometric parameters because the compared groups of students belonging to different academic semesters showed homogeneity in autonomy and control for learning, with a slightly greater tendency in students from the more advanced academic semesters. This finding would indicate that autonomy in learning supports, during professional training, learning achievements, wellbeing, and adjustment in the classroom [14]. In other contexts, promoting self-regulated learning in students during their first year at university is necessary to enable them to stay longer and avoid dropping out [83]. The difference could be explained by the period in which the data collection was carried out (the middle of the academic semester). The results might have been different if the measurement had been done at the beginning.
One of the indices that showed inconsistency with others (CFI, TLI, G-h) was the Mc [65], a fact possibly associated with the different validity of the items with their construct (factor loads) or the imbalance of the groups compared in a three-group context [84], as occurred in the present study. However, since the rest of the adjustment indices and comparisons between them were satisfactory, the validity of the invariance evaluations can also be considered satisfactory. It can be concluded that the construct is measured invariantly among the three semesters compared. The minor differences in structural invariance (e.g., means, variances, and latent correlations) are not tests of measurement invariance because the differences, even slight, are related to the latent attribute once it has been controlled for differences in the measurement (i.e., measurement invariance).
Regarding the study limitation, we did not evaluate the replicability of the results since a similar-sized sample was not available in this study phase. Although the replicability of the results can be partially concluded from the invariance study [85], a complete study with larger sample size is required. Another limitation is the possible lack of representativeness of the participants because it does not ensure the generalization of the results. The study did not prove the population representativeness of the participating sample, and the degree of representativeness is inconclusive. Finally, it is unknown how the scores obtained from this new version are associated with behaviors and constructs relevant to practical academic life, such as academic performance, student well-being, and other adaptive behaviors. These limitations suggest that, although this version of the SRQ-L can achieve a better psychometric definition in the sample studied, additional studies are required to verify its results.

Conclusions
The Hispanic version of the SRQ-L applied to Peruvian students is a measure with satisfactory evidence of validity in its internal structure. The results showed that the two constructs were maintained but without two items from the Autonomy scale and two from the Control scale. Because disorder and trivial distinction of some response thresholds were detected, the scaling of the items needed to be modified from an original scaling of five to one of three response options.
The items of this new version show moderate scalability (measured from Mokken Scaling Analysis) and a satisfactory fit to the monotonic homogeneity model. From structural equation modeling, the items showed to be moderately or strongly associated with their constructs. The reliability of the scores was appropriate for use in interpreting the construct in groups, and the reliability of the items was acceptable. The measurement invariance was maintained in the groups of the chosen semesters. The structural invariance indicated that this characteristic was not fulfilled concerning the mean and latent variance between some semesters. This result was explained as possible natural differences among the groups. The size of these differences was generally small. Finally, the association between the two constructs measured by SRQ-L (autonomy and control) varied between the compared semester groups, indicating a possible evolution of these constructs aligned with their dependency. Although these results point to a more complete and optimal psychometric characterization of SRQ-L, the possible intercultural variability should be evaluated and avoided inducing its validity from one or more studies [86][87][88].