1. Introduction
Empirical evidence has shown that students’ learning approaches contribute significantly to their academic success in higher education (e.g., [
1,
2]). Learning approaches could be conceived as an individual’s adopted predispositions when dealing with tasks and strategies used to process learning materials which can be deep or surface in nature [
3,
4]. A deep approach to learning involves concentration on latent meanings of the material to be learned, while a surface approach entails memorization and less priority on the conveyed messages in the presented tasks. Deep learning has been an emerging focus of higher education studies in preparing future leaders for our ever-increasingly diverse society [
2]. For many decades, educators have been challenged by the proper conceptualization and operationalization of students’ approaches to learning (SAL).
The SAL theory of Marton and Säljö [
5,
6] uses phenomenography coupled with some constructivist perspectives of Biggs [
7,
8] and has provided theoretical frameworks for conceptualizing students’ approaches to learning. This is evident in the way approaches to learning have been defined to include motives, predispositions, styles, strategies used in adopting a process of learning tasks. Moreover, the classification of approaches students adopted when learning into ‘surface’ and ‘deep’ has greatly influenced SAL measuring instruments. A widely studied instrument for measuring SAL is the Biggs’ [
9] study process questionnaires [
10]. This instrument has undergone several revisions and validations from its initial 72-item to the present 20-item two-factor revised study process questionnaire [
9]. It has gained equally wide acceptance among educators, with many studies on its psychometric properties, and Cronbach’s alpha ranges 0.57–0.85 have been reported as evidence of the item’s internal consistency [
9,
11].
However, the cultural specificity of the two-factor revised study process questionnaires (R-SPQ-2F) has generated considerable discussion with inconclusive results when the instrument has been translated to different languages (e.g., [
12,
13]). Apart from the two models hypothesized by Biggs, Kember and Leung [
9], several alternative models have been proposed, and some items were deleted to achieve modest fits in explaining the underlying factor structures of the instrument. This current study was framed with the sole aim of exploring and comparing alternative models that best explain the construct validity of the R-SPQ-2F when translated to the Norwegian context. This article is a continuation of a work reported in [
14], where the Biggs’ et al. [
9] hypothesized models were investigated and found to poorly represent Norwegian data with non-admissible solutions. In the earlier work, a new model for the R-SPQ-2F was proposed and confirmed, and the model fits and scale reliability were investigated and reported. The purpose of this paper is to contribute to this body of research and expose some observable methodological weaknesses inherent in some reported hypothesized models in literature.
2. Literature Review
Studies on the cultural specificity of the R-SPQ-2F can broadly be classified into two major categories. The first category represents those that report first-order two-factor structures—the deep approach (DA) and the surface approach (SA)—as the best explanatory models for the instrument with ten items on each subscale [
15,
16,
17,
18]. This first category can further be divided into those that include error covariance—the presence of a systematic commonly shared variance—between indicators (e.g., [
17]) and those that did not include the covariance (e.g., [
16]). However, Biggs et al. [
9] were the first to start a discussion on the factor structure of their then newly developed instrument, the R-SPQ-2F, by hypothesizing and testing two models. The first model was a first-order four-factor model containing ‘deep motive,’ ‘deep strategy,’ ‘surface motive,’ and ‘surface strategy’ measured by five items each. The first model was tested and found to fit their 495 data with a comparative fit index (CFI) of 0.904 and a standardized root mean squared residual (SRMR) of 0.058 [
9]. Further, CFIs of 0.997, 0.998, 0.988, and 0.998 and SRMRs of 0.01, 0.02, 0.02 and 0.02 were also reported on ‘deep motive,’ ‘deep strategy,’ ‘surface motive,’ and ‘surface strategy’ subscales, respectively. The second model was a first-order two-factor model containing deep and surface approaches with two indicators each—motive and strategy—gotten by corresponding item parceling (adding scores on five items) in the first model. The results of the second model also suggest a good model fit with a CFI of 0.992 and an SRMR of 0.015, both of which are within the proposed cutoffs by Hu and Bentler [
19].
These two hypothesized models of Biggs et al. [
9] have steered heated debates among educators and methodologists when subjected to confirmatory analysis in an independent cultural context. For example, the two models were tested and found to fairly explain the factor structure of the R-SPQ-2F when translated to Spanish in a study involving 836 undergraduate students, out of which 314 were used for exploratory factor analysis and the remaining 522 were used for confirmatory factor analysis [
15]. An alternative model of a first-order two-factor model was proposed and tested containing the deep and surface approaches measured by their corresponding ten items each as theorized in [
9]. The results suggest a modest fit with a significant
-value (169) = 645.77,
p < 0.05, goodness of fit index (GFI) = 0.95, SRMR = 0.09, root mean square error of approximation (RMSEA) = 0.07, non-normed fit index (NNFI) = 0.91, CFI = 0.92, parsimony normed fit index (PNFI) = 0.80, and parsimony goodness of fit index (PGFI) = 0.76. In another study, Önder and Besoluk [
18] reported a Turkish validation of the instrument when administered to 528 undergraduate students. Their findings also identified a first-order two-factor model as the best explanation for the construct validity of the R-SPQ-2F. Their results involved a significant
-value (166) = 487.95,
p < 0.05, GFI = 0.89, SRMR = 0.07, RMSEA = 0.06, NNFI = 0.90, CFI = 0.93, PGFI = 0.92, incremental fit index (IFI) = 0.93, and relative fit index (RFI) = 0.88. A major difference between these results and that of Justicia, Pichardo, Cano, Berbén and De la Fuente [
15] was the inclusion of error covariance between items 8 and 10 as well as between items 11 and 20.
Non-admissible solutions and poor fits for the hypothesized models in [
9] were also reported in a study involving 269 university and non-university students [
17]. Following confirmatory factor analysis results, a first-order two-factor model was identified as the best explanation for the construct validity of the R-SPQ-2F. A significant
-value (168) = 259.32,
p < 0.05, was also reported and coupled with SRMR = 0.07, RMSEA = 0.05, Tucker-Lewis index (TLI) = 0.95, and CFI = 0.96. Similar to the findings of Önder and Besoluk [
18] an error covariance was also defined between items 4 and 14 in order to achieve a good model fit. Corroborative results can also be found in a Chinese validation of the R-SPQ-2F involving 439 university students, in which a first-order two-factor model was also reported [
16].
Table 1 presents a juxtaposition of the findings of these studies for easy comparison.
There seems to be a consistency in the results of previous studies presented in
Table 1. They corroborate the theoretical explanation of indicators measuring the DA and the SA as proposed in [
9] with an exclusion of additional subdivisions of each factor into motive and strategy. The negative standard correlation coefficients found in all the studies between the deep approach and the surface approach subscales are indicative of discriminant validity. A close look at the results of Merino and Kumar [
17] as well as Önder and Besoluk [
18] suggests a better fit of their models as compared to others. This can be deduced from their reduced
-values and RMSEA within the range suggested in [
19,
20]. However, the inclusion of error covariance between some indicators in their models could pose some complications in the application and interpretation of the scale item scores by classroom teachers.
The second broad category of studies on the cultural specificity of the R-SPQ-2F are the reports of two first-order and four first-order factor structures with some items deleted to achieve good fits (e.g., [
10,
21]). The number of items deleted ranged from 2–5. Immekus and Imbrie [
22] after establishing non-admissible solutions of the hypothesized Biggs’ et al. [
9] model, subjected the data from their first cohort of 1490 university students to an exploratory factor analysis (EFA). The results of their EFA gave four extracted latent factors which they identified as ‘deep motive,’ ‘deep strategy,’ ‘surface motive,’ and ‘surface strategy’ after rotating using Promax. Five items that exhibit substantial cross-loading were removed from the model. The first-order four-factor model was then subjected to a confirmatory factor analysis in an independent cohort of 1533 university students’ sample. The results of a confirmatory analysis suggest a modest fit with a significant
-value (114) = 568.54,
p < 0.05, RMSEA = 0.05, and CFI = 0.96. Surprisingly, relatively high positive correlations of 0.76 and 0.59 were found between ‘deep motive’ and ‘deep strategy’ as well as ‘surface motive’ and ‘surface strategy,’ respectively.
No empirical evidence was found to support the first Biggs’ et al. model in the Japanese validation of the R-SPQ-2F reported in [
21]. However, a modest fit for the second Biggs’ et al. model involving a first-order two-factor model with item parceling on each deep and surface approach latent factors was confirmed. The study involved 269 university students distributed across different programs in a Japanese tertiary institution. The results of their confirmatory analysis did not include the
-value, instead an RMSEA = 0, CFI = 1, and TLI = 1 coupled with a positive correlation cooeficient of 0.30 between the deep and surface approach latent factors were reported [
21]. There are some reservations with respect to these results. First, the goodness of fits (GOF) indices indicate a perfect fit of the model, which appears to be unrealistic. However, an observed methodological issue could stem from the degress of freedom (though not reported), which is 1. This could make it difficult for the variance/covariance matrix to be positively definite. Unfortunately, nothing was mentioned in the article with respect to this matrix. Another methodological difficulty that could even lead to the rejection of this model is the positive correlation of 0.30 reported between deep and surface approaches. This shows a non-discriminating capacity of this model between the deep and surface approaches which is contrary to both the theoretical and the conceptual interpretations of the instrument.
More so, Stes, De Maeyer and Van Petegem [
12] could not also find any supportive empirical evidence for both models hypothesized by Biggs et al. [
9] in the validation of their Dutch version of the instrument involving 1974 effetive sample of students distributed across diverse university programs. For this reason, an exploratory factor analysis was conducted on a randomly selected 963 cases from the total sample, using maximum likelihood for factor extraction and an oblique rotation. Five factors were initially identified, and these were later collapsed into four factors—study is interesting (SI), spending extra time (ST), minimal effort (ME), and learning by heart (LH)—after a series of confirmatory factor analyses and item deletions. The final fitted solution was a first-order four-factor model with three items measuring SI, four items measuring ST, five items measuring ME and three items measuring LH. The final chi-squared statitistic as well as the degree of freedom were not reported. However, some GOF indices such as GFI = 0.95, absolute goodness of fit index (AGFI) = 0.93, RMSEA = 0.06, CFI = 0.94, and PGFI = 0.66 were reported as evidence of a good fit for their model. Further, relatively high correlation coefficinets of 0.76 and 0.62 were found between SI and ST as well as between ME and LH, respectively.
In an attempt to reconcile between variant inconclusive models results on the R-SPS-2F, Socha and Sigler [
13] conducted a validation study on the instrument invoving 868 university students. In their study, eight models were compared using a confirmatory factor analysis, and a first-order two-factor solution was found as the best explanation for the construct validity of the instrument involving the deletion of two items from the original version. Their final results included a significant
-value (134) = 504.83,
p < 0.05, SRMR = 0.05, RMSEA = 0.06, and CFI = 0.95 and a negative correlation of −0.38 between the deep and surface approach latent factors. Similar results can also be found in another Spanish validation of the R-SPQ-2F involving 279 university students, in which a first-order two-factor model coupled with two item deletion was also reported [
10].
Table 2 presents a juxtaposition of the findings of these studies for easy comparison.
The results presented in
Table 2 reveal variant and inconclusive solutions of the models. These can be ascribed to some methodological issues inherent in the factor analysis procedure as well as the cultural sensitivity of the instrument. For example, Immekus and Imbrie [
22], after establishing non-admissible solutions of the hypothesized models in [
9], subjected their data to an exploratory factor analysis (FA). Difficulties arose when some indicators loaded (loadings greater than |0.3|) on more than one extracted factor. Rather than seeking theoretical explanations for this observation, they opted to delete these indicators from the scales. For instance, item 1 loaded on deep motive (DM) and deep strategy (DS) with 0.31 and 0.42 oblique rotated loadings, respectively. This could be suggestive of over-factoring in the extraction, especially when this item has been theorized to measure both DM and DS. To support this claim, the high positive correlations of 0.76 and 0.59 reported between DM and DS as well as surface strategy (SS) and surface motives (SM), respectively, are indications of multicollinearity, which could be addressed by collapsing the subcategories.
A similar methodological issue is also perceived in the analysis of Stes et al. [
12] with high positive correlations of 0.76 and 0.62 between SI and ST as well as ME and LH, respectively. Another methodological issue involved in the analysis of Stes et al. [
12] and Fryer et al. [
21] is the use of maximum likelihood estimator, which has been found to perform poorly in the analysis of ordinal data (e.g., [
23,
24]). It is also important to remark that SI combined with ST and LH combined with ME are other ways to refer to the DA and the SA, respectively. Later studies (e.g., [
10]) seem to address some of these methodological issues, yet the cultural specificity of the R-SPQ-2F still remains an important consideration when adapted to a different language from English. Therefore, the current study sought to build on this literature in searching and evaluating hypothesized models to explain the construct validity of the R-SQP-2F in the Norwegian context.
4. Results
The first set of results as presented in
Table 3 represent the tested and hypothesized two first-order factor models of the R-SPQ-2F, as in the literature. Analyzed results from hypothesized model of Xie [
16] are included in
Table 3, and those of Justicia et al. [
15] were excluded, because they both practically advocated the same model and the former is more recent. Notations and abbreviations used in
Table 1 are repeated in
Table 3, with M2 used for Önder and Besoluk [
18], M3 used for Merino and Kumar [
17], and M4 used for Xie [
16].
The results presented in
Table 3 show admissible solutions of the two first-order factor model of the R-SPQ-2F. Negative standard correlations found between deep and surface components are suggestive of a discriminant validity between these subscales. This could be interpreted to mean a student with a high score on deep approach items had a low score on surface approach items and vice versa. This makes sense and is conceptually sound. However, the high
-values (496.21–522.18) coupled with out of range fit indices are indicative of the poor fit of these models. The model proposed by Önder and Besoluk [
18] seems to perform better than others, with the lowest
-value (167,
p < 0.05) = 495.21 and RMSEA ≤ 0.08. Meanwhile, the two error covariances involved between item 8 and item 10, as well as item 11 and item 20, could pose some complications in classroom conceptual understanding and interpretation of scores from this instrument. Therefore, all these models are not statistically and conceptually fit to justify the construct validity of the R-SPQ-2F in the Norwegian context.
The second set of results concern the CFA of the two-factor and four-factor models hypothesized to explain construct validity of the R-SPQ-2F. The model result of Fryer et al. [
21] was not included in
Table 4 because it has been reported in [
14]. The analyzed result showed a non-admissible solution of the model with a negative error variance on the SM indicator, a result that is suggestive of over-factoring in the model [
14]. Further, analyzed results from the model hypothesized by López-Aguado and Gutiérrez-Provecho [
10] were included in
Table 4, but those of Socha and Sigler [
13] were omitted because they are both practically advocating same model and the former is more recent. Notations and abbreviations used in
Table 2 are repeated in
Table 4 with M5 used for Immekus and Imbrie [
22]. Mod. M5 was used for modified M5, M7 was used for Stes et al. [
12], Mod. M7 was used for modified M7, and M9 was used for López-Aguado and Gutiérrez-Provecho [
10].
There seems to be indications of good fits in all the models analyzed and reported in
Table 4. The reduced
-values between 152.28 and 301.44 coupled with within suggested range indices may prompt one to conclude that M5 and M7 have been demonstrated as the best models. However, there was evidence of a gross misspecification and a high multicollinearity between DM and DS, which are suggestive of over-factoring in M5. This is evident with a high standardized correlation coefficient (r = 0.74,
p < 0.05) between the DM and DS latent factors. This posed some methodological difficulties involved in trying to balance both the theoretical and conceptual understanding that could yield a substantive interpretation of scores from the instrument. Therefore, an attempt was made to revive this model as reported under the heading modified model 5 (Mod. M5). Here, items measuring DM and DS were merged to form a factor (DA), and those measuring SS and SM were merged to form another factor (SA). The resulting two-factor model was subjected to CFA, and selected GOF indices are presented in
Table 4 with the heading Mod. M5. The
-value (89, N = 253) = 289.25 became bigger, and all the fit indices were out of range.
The analyzed results of the proposed model by Socha and Sigler [
13] were even worse. The latent variance-covariance matrix was not positively definite, which is a necessary condition for an acceptable model (see [
28]). This was observed with the presence of the Heywood case in terms of a standardized correlation coefficient great than 1 between latent factors SI and ST. In a similar manner to M5, this model was modified, and the CFA results were reported with the heading Mod. M7 in
Table 4. The resulting
-value (89, N = 253) = 257.02,
p < 0.05 is significant, but, when combined with GOF indices, qualifies the model to an appropriate fit of the data [
29]. However, a comparison of this model results with the ones reported in
Table 4 with the heading M9 [
10] favored the latter. This is evident with the higher CFI/TLI and lower SRMR and RMSEA values observed in M9. Therefore, from the foregoing discussion, what appears to be the best explanation of the factor structure of the R-SPQ-2F in Norwegian context is the hypothesized model of López-Aguado and Gutiérrez-Provecho [
10].
5. Conclusions
Teaching is considered successful when accompanied by meaningful learning. A good way to ensure that successful learning takes place is to investigate the approaches adopted by students when learning. Several efforts have been expended to promote deep learning in higher education such that the emerging leaders will better be prepared for an ever-increasingly diverse society [
30,
31]. Both qualitative and quantitative studies have, for several years, been directed towards proper conceptualization and operationalization of students’ approaches to learning. Prominent among these studies are the works of Marton and Säljö [
5,
6], Entwistle and Waterston [
32], and Biggs [
33]. These have led to the development of measuring instruments in which the study process questionnaire (SPQ) seems to have gained global attention. However, studies on the cultural specificity of the latest SPQ called the R-SPQ-2F have generated diverse and inconclusive results.
In this article, investigations were geared towards addressing the issue of R-SPQ-2F cultural specificity when applied in the Norwegian context. Several models were compared, and what seems to be the best explanation of the R-SPQ-2F construct validity is a two first-order factor model involving deep and surface approaches to learning subscales. Meanwhile, by comparing the identified tested hypothesized model proposed in [
10,
13], as reported in
Table 4, with the results of the appropriate fit (
-value (151, N = 253) = 377.68,
p < 0.05, SRMR = 0.072, CFI = 0.844, TLI= 0.824, and RMSEA = 0.077) found in [
14], a conclusion could be drawn. There appears to be no obvious statistical difference with consideration for respective
-values and GOF indices of these two models. Therefore, a two first-order factor model with 10 items measuring the deep approach and nine items (contrary to eight items in [
10]) measuring the surface approach is still considered the best explanation for the R-SPQ-2F construct validity.
The justification for removing one item from the instrument was previously explained in detail in the first article. This study is to be followed up with an independent sample that will be collected in the near future for a confirmation of the proposed model and the predictive validity of the R-SPQ-2F. An interpretation of item scores can be achieved simply by adding corresponding items on the deep approach (scaled by dividing the sum by 10) and those on the surface approach (scaled by dividing the sum by 9) for classroom decisions. It is hoped that future replications of this study across other universities and groups of students will be carried out. This instrument is therefore recommended for measuring year-one undergraduate students’ approaches in Norwegian universities. For instructional purposes, both data and Mplus syntax codes used for this study are available upon request from the author.