1. Introduction
Extensive research in higher education consistently suggests that student engagement is strongly associated with desirable educational outcomes such as increased learning, persistence in college, and graduation [
1,
2]. A student’s knowledge, skills, and dispositions at the end of his or her undergraduate years undoubtedly have much to do with where he or she stands upon entering the world of work. Researchers argue that a university should be measured by how much students gain while enrolled, rather than what is measured on an absolute scale [
3,
4]. Some students grow more than others during their university years. At present perhaps the biggest challenge for several higher education institutions and quality assurance agencies is trying to develop and validate quality assessment instruments to determine a robust approach to measuring learning outcomes in ways that are valid across cultures and languages, and across the diversity of institutional settings.
Theories largely developed and largely supported by research conducted with students at institutions in a certain higher education context are likely to fit students at those institutions better than students at institutions in other higher education contexts [
5]. Some levels of distinctiveness and modifications have been suggested by validation studies of student engagement surveys when used for two years and four year college students [
6]. There is also evidence that suggests structural modification for the student engagement scale when used in a single institution study as opposed to large scale study [
7,
8,
9]. Regardless of this, however, in higher education research, development and validation of student engagement and self-reported gains scales and cross-cultural comparisons have been rarely studied [
10]. The lack of evidence supporting the cultural validity of constructs such as self-reported gain has implications for the wider utilization of those educational constructs across different higher education systems allowing customization to fit with different circumstances.
The development and validation of a self-reported Learning Gains Scale that is suitable for a particular higher education setting is valuable in examining the cultural relevance of an established construct while at the same time, creating local conditions and imprints to maximize the benefits [
11]. A deeper analysis of the psychometric properties of a measuring instrument can be investigated through analysis of measurement invariance across groups at several levels. While this is most wanted and needed, the actual practice needs careful, step-by-step analysis [
12].
Simply stated, Student Learning Gains is a concept that measures how improvements in knowledge, skills and personal development made by students during their time in higher education [
13]. A Student Learning Gains Scale, which is part of the Australasian Survey of Student Engagement (AUSSE) is a widely used outcome survey to measure student learning and development in a range of dimensions [
14]. This scale is about attitudes, values, and self-concept and is ‘more properly used and interpreted as evidence of students’ perceived learning and affective outcomes’ [
15]. However, in effect, this scale is not a direct measure of students’ learning and development as in standardized tests; the self-reported gains questions have sufficient content validity or communality with the direct measures of cognitive tests [
16].
The Student Learning Gains Scale has been predominantly found within the student engagement survey. Also, there are corresponding outcome scales such as the graduate and alumni survey tools as well as employability and satisfaction survey scales that involve similar features with the Student Learning Gains Scale [
17]. Despite variations in foci, these different scales have things in common, that is, the responses are self-generated, personal in nature, and used to define affective learning outcomes [
18].
The student engagement survey instrument has been used in Australian universities since 2007 [
19]. This survey, like other student engagement surveys, utilizes an analytic rating system to gauge the nature and quality of education through five different benchmarks along with an overall quality of institution and satisfaction with the service rendered [
14]. The AUSSE is often used to collect information on around 100 specific learning activities and conditions along with information on individual demographics and educational contexts. Part of this instrument contains items that map onto seven outcome measures. While cumulative grade point average (CGPA) is captured in a single item the other six are composite measures which reflect responses to several items including:
Practical skills—participation in higher-order forms of thinking;
General Learning Outcomes—development of general competencies;
General Development Outcomes—general forms of individual and social development;
Career Readiness—preparation for participation in the professional workforce;
Average Overall Grade—average overall grade so far in courses;
Departure Intention—non-graduating students’ intentions on not returning to study in the following year; and
Overall Satisfaction—students’ overall satisfaction with their educational experience.
Historically, the Student Learning Gains Scale was represented by a 3-factor solution. This scale first originated in North America and has been widely used in the student engagement survey. The validity of this scale has been repeatedly tested in different countries [
11,
14,
18,
20].
The three latent factors used in the current study, were similar to the original version of the AUSSE, and included: general education; personal and social development; and practical skills. There were 15–16 measurement variables used as indicators of Student Learning Gains Scale in the AUSSE instrument, and another additional four outcome measures as stated in the above bullet points. However, the current study considered 13 indicators of the Student Learning Gains Scale measure. This study deliberately omitted the two outcome measures: career readiness and departure intentions from the outcome measures due to contextual differences in terms of the relevance of these issues. For example, academic failure is more pronounced in the Ethiopian higher education context rather than departure intention issues. Moreover, career intentions are more broad institutional issues that may preclude focusing on specific academic issues that need a more specific focus and intervention strategies.
The main purpose of the study was to develop and validate a 3-factor model of Student Learning Gains Scale as applied in the Ethiopia higher education context. The other important purpose was to help readers who may be new to the applications of structural equation modeling technique, to provide an example of how these procedures may be applied in testing for the psychometric properties of a measuring instrument. The study emphasized how this scale was contextualized and used in the Ethiopian higher education setting. In this study, confirmatory factor analysis using structural equation modeling was applied to test construct validity and factorial validity of a Student Learning Gains Scale, along with, analysis of measurement invariance across colleges and grade years. More specifically, the study answered the following major research questions.
Does the variable in the Student Learning Gains Scale of the data collected from a university in Ethiopia represent construct validity (substantive or content validity)?
Does a 3-factor Student Learning Gains Scale model fit to the data that were collected? If not, what factor structure can be suggested as fitting well with the data?
Does the Student Learning Gains Scale factor predict important student behaviors and outcomes?
Does the Student Learning Gains Scale, as applied in an Ethiopian higher education context, demonstrate measurement invariance across college type and class year?
2. Materials and Methods
2.1. Study Design
This study used a cross-sectional survey design comprising of a self-reported gains scale to collect data from large samples (n = 536) of undergraduate students at a university in Ethiopia.
2.2. Study Participants
Participants were volunteers recruited from the student population in the College of Natural Sciences and College of Social Sciences and Law at a large public university in Ethiopia. Both background characteristics and university experiences were considered in the selection of study participants. The survey was conducted in English to all study participants.
Table 1 presents a summary of the participant characteristics as a percentage of the sample across colleges.
Five hundred and thirty-six (107 females & 429 males) undergraduate students participated in the study, of whom, 206 were in the college of Natural Sciences and 330 were in the College of Social Sciences and Law. The sample participants’ gender composition reflects that the proportion of males is far greater, accounting for over 80% of the samples across colleges. The mean ages of student samples in the two colleges were similar, but there was a significant mean difference in students’ CGPA indicating variation between the two colleges.
2.3. Measures
The Student Learning Gains Scale with college academic experience was assessed using sub-scales through which participants were asked to think about their experience during the undergraduate years while reading each statement and indicate how true each statement was for them in terms of what they have gained. Student Learning Gains Scale items began with, ‘To what extent has your experience at this college contributed to your knowledge, skills and attitudinal development in the following areas?’ and were scaled 1 (very little) to 4 (very much). A self-reported gain was measured using individual student scores on composite measures of the Student Learning Gains Scale: general education, personal development, and practical thinking skills outcomes. The general education measure includes an individual student score on composite measures of the three general education items. The personal and social development measure includes an individual student score on the composite measure of six items in the areas of professional and social skills and behaviours. Similarly, the practical skills measure includes an individual score on the composite measure of four items in higher-ordered thinking skills and behaviours including familiarity and use of ICT in education.
2.4. Data Analysis
Both qualitative and quantitative procedures were used to validate the scale before utilizing it in Ethiopia, however only the quantitative data has been used for this paper. The qualitative analyses were used to refine item wording, maintain standards, and assess the appropriateness of the scale. The quantitative analyses helped for testing internal consistency of the scale and ensure that the scale measured the intended target constructs with acceptable levels of bias and precision. Validations were conducted based on the multidimensional validation work built on the approach suggested by Griffin, Coates, McInnis and James (2003) and Coates (2006) including ‘experts’ review (both in Australia & Ethiopia), pilot testing and review, and reliability analyses,’ and Confirmatory Factor Analysis (CFA) and extended the validation works including measurement invariance and regression analysis [
21,
22]. With these later analyses of invariance and regression, additional supporting evidence of the criterion validity and discriminant validity were obtained. With the initial experts’ review, pilot testing, and reliability analysis (n = 74 students) this study was able to modify the wordings of a few items and drop some variables found below the acceptable threshold of 0.70 (Nunnally, 1978).
The central concern of measurement invariance is the testing of measurement equivalence across groups [
23]. This test can be conducted at different levels and most common among them are first-order models and second-order models [
24]. There are suggested procedures for the testing of measurement invariance across a hierarchical series of models, and their common purpose is maximizing interpretability of the results sought at each step of the hierarchy [
25,
26]. This study explored first-order models and second-order models just to provide adequate evidence to the invariance tests conducted in this study, however, it should be noted that the hierarchies have more advanced models [
27].
2.5. Missing Values and Internal Consistency
Missing values were managed by excluding students’ responses from the analysis. Originally, a total of 596 responses was collected. However, 60 (10%) of the respondents were removed from the analysis due to excessive information loss or incompleteness of information and a few outliers in age category. Thus the final students involved in the analysis consisted of 536 students’ responses. Regardless of this, there were few random missing values across the scales. Evaluation of Cronbach’s alpha values showed that there was generally strong consistency with the underlying construct being measured within a factor, with moderately high alpha values greater or equal to 0.70. The reliability coefficient of the self-reported gains (α = 0.89) was high for sample-based research.
4. Discussion
The findings of this study suggest that the self-reported gains scale is comprised of three interdependent factors: a general education factor, a personal and social development factor, and a higher-order thinking factor. These factors demonstrated reasonable construct validity as measured in terms of convergent validity and discriminant validity. The convergent correlation of all the items used in the scale (alpha = 0.89) is higher than the correlation between the factors (alpha 0.70–0.82) discriminant ones. When there is a relatively high correlation between the items used to measure a scale compared with the correlation between factors, there is evidence of both convergent validity and discriminant validity [
40]. Moreover, the extent of correlations between the factors also provides supporting evidence for the discriminant validity of the scale as there is evidence that the different factors in the scale are not excessively correlated with each other (e.g., >0.85) [
32].
In this study, there were moderate to high positive correlation between the validity variables and Student Learning Gains Scale, 0.31 <
r > 0.53,
p < 0.001. In terms of strength, there were high correlation between the Student Learning Gains Scale factors and student self-efficacy and overall satisfaction [
41]. Relatively speaking, the three gains factors moderately correlate with the student future intention than others,
r = 0.31 and 0.32.
The factorial validity, measurement invariance and criterion validity were tested using statistical tests and practical indices testifying evidence for the adequacy of the construct to measure Student Learning Gains Scale across samples of participants. Thus, there is supporting empirical evidence of the scale’s reasonable psychometric properties to be used in the Ethiopian higher education context as the primary means to share results and compare institutions. This is consistent with research conducted in different parts of the world providing supporting evidence to the usefulness and practicability of the gains scale [
14,
18,
42].
Empirical evidence shows that students are credible sources of information on matters related to what they have experienced in universities and how much they have benefited from their learning experiences [
43]. Students may be the best qualified to describe what they have gained from their experience in a university, particularly in some areas such as affective outcomes and practical skills [
44]. However, for better results, items should be clearly worded and students have the information required to accurately answer the questions within the prevailing conditions [
10]. Research shows that students respond more carefully and with much personal interest to the content of such questionnaires [
43]. The present research provides initial evidence about the practicability of the self-reported gain scale for an institution study in the Ethiopian higher education context as the findings demonstrated sufficient psychometric properties. For example, the construct validity and factorial validity of the items used in this self-reported gains scale was adequate.
The current study relied exclusively on self-report measures of the Student Learning Gains for its data is a limitation of the study. Also, the inclusion of only students of two colleges in the sample of the current study limits the generalizability of the findings. These limitations must be considered in understanding the conclusions presented in this study.