Next Article in Journal
Redesigning Mathematical Curriculum for Blended Learning
Previous Article in Journal
From Resistance to Digital Technologies in the Context of the Reaction to Distance Learning in the School Context during COVID-19
 
 
Article

Universal Design for Learning: The More, the Better?

by *, and
Institute for Science Education, Leibniz Universität Hannover, 30167 Hannover, Germany
*
Author to whom correspondence should be addressed.
Academic Editor: James Albright
Educ. Sci. 2021, 11(4), 164; https://doi.org/10.3390/educsci11040164
Received: 4 March 2021 / Revised: 22 March 2021 / Accepted: 27 March 2021 / Published: 1 April 2021
(This article belongs to the Section Special and Inclusive Education)

Abstract

An experimental study investigated the effects of applying principles of the Universal Design for Learning (UDL). Focusing on epistemic beliefs (EBs) in inclusive science classes, we compared four groups who worked with learning environments based more or less on UDL principles and filled out an original version of a widely used EBs questionnaire or an adapted version using the Universal Design for Assessment (UDA). Based on measurement invariance analyses, a multiple indicator, and multiple cause (MIMIC) approach as well as multi-group panel models, the results do not support an outperformance of the extensive UDL environment. Moreover, the UDA-based questionnaire appears to be more adequately suited for detecting learning gains in an inclusive setting. The results emphasize how important it is to carefully adopt and introduce the UDL principles for learning and to care about test accessibility when conducting quantitative research in inclusive settings.
Keywords: universal design for learning; universal design for assessments; science education; epistemic beliefs; inclusive science teaching universal design for learning; universal design for assessments; science education; epistemic beliefs; inclusive science teaching

1. Introduction

The Universal Design for Learning (UDL) provides a theoretical framework for the conception of teaching that addresses the accessibility of learning content and welcoming students’ diversity. Accessibility is thought of here in terms of minimizing barriers—an idea that is central to many approaches to implementing inclusive teaching in order to achieve participation in learning for all students [1]. Participation in education for all students is a current social and political challenge. The increasing diversity of learners should be met positively and serve as a resource. In this context, inclusion is defined as a term for an appreciative and welcoming approach to diversity [2,3]. UNESCO also mentions education with the Sustainable Development Goals. It explicitly addresses inclusive education. By 2030, education systems worldwide should be adapted to be more equitable to the diversity of learners. All people, regardless of background, should have access to education and be able to participate in it. This will also realize the right to education [4].
The basic assumption of UDL is that monomodal teaching approaches tend to focus on the “average student” and lead to barriers for many other students. Multimodality of a learning environment is created by multiple forms of representation, processing, and motivational or motivation-maintaining elements in the learning environment [5]. Metaphorically speaking, UDL puts the what, the how, and the why of learning in the focus of lesson planning. The concept of UDL has been widely used in many approaches all around the world [6,7,8,9]. It can be shown that all students—not just those with special education needs—can benefit from a UDL based learning environment [10]. It has also been a guideline for systemic educational reform after the COVID-19 pandemic [11].
However, UDL is not free from criticism. It is seen as a very complex framework that is, on the one hand, very inspiring to educators but can, on the other hand, also be arbitrary when it comes to concretizing and operationalizing the guidelines [12,13]. It is in question whether UDL is adequately defined to derive clear interventions and to isolate the active components to make UDL effective. Moreover, what remains unclear is “the dosage of UDL intervention needed to achieve access, engagement, and success” [14] (p. 1). One of the main points of critique is that evidence from clear-cut, rigorous studies focusing on the impacts of UDL are missing to a large extent [15]. While studies often report a positive perception of the subjects’ learning process and the teaching material [9], the meta-analysis by Capp [7] shows that there are hardly any studies reporting on the learning gains, and UDL research focuses primarily on the principles of multiple representations—only one of three principles of the UDL guidelines [5]. Al-Azawei et al. [6] came to a similar conclusion. This is problematic because UDL does not necessarily require implementing all principles, guidelines, and checkpoints. However, UDL wants to see itself as a holistic model for lesson planning, “the learning outcomes associated with the implementation of UDL need to be demonstrated through experimental studies within curriculum areas” [7] (p. 804).
With this regard, we report a quasi-experimental study on the impacts of systematically varying the number of UDL principles applied to students’ learning gains. With a science education background, we focus on fostering students’ epistemology in science which is considered an important learning goal on an international level [16,17].

2. Theoretical Background

2.1. Epistemic Beliefs in Science

Epistemic beliefs (EBs) are individual beliefs about the nature of knowledge (beliefs about what knowledge is) and the nature of knowing [18]. Epistemic beliefs are conceptually close to the students’ views on the nature of science [19]. They are part of students’ epistemic cognition which can be described as, “the thinking people do about what and how they know” [20] (p. 457).
One line of conceptualizing EBs in science consists of defining multiple dimensions [21]. Although there is still an ongoing debate on the specific dimensions that can be defined [22], we referred to the widely used four-dimensional approach of epistemic beliefs in science as described by Conley et al. [23]. According to this approach, epistemic beliefs comprise beliefs about the certainty, development, source, and justification of knowledge in science. For all four dimensions, a span can be assigned to the extent to which the corresponding factor is pronounced. The span ranges in each case from a naïve to a sophisticated level (Table 1) [24].
These dimensions have been used in a whole range of studies. There is a body of evidence supporting their importance for learning processes [25], for their dimensional structure [18,26], as well as the relations to academic achievement [27]. Recent meta-analytical perspectives support the assumption that EBs can be fostered during intervention studies being either the focus of an intervention or playing the role of a co-construct that supports learning processes [28].
Fostering students’ epistemic cognition is one of the grand goals of science education efforts all around the world. Rather sophisticated EBs can be seen as a prerequisite to understanding fundamental aspects of science and of how science is represented and discussed in—and influences—society. This is one part of enabling societal participation [29,30]. Having this in mind, EBs appear particularly important for an inclusive science education that focuses on helping all students participate in a society that is, to a large extent, shaped by science and technology and does not only aim at providing a later STEM workforce for economic or academic purposes. For these reasons, we decided to focus on fostering EBs in inclusive science classrooms, comparing an extensive and focused UDL-setting.

2.2. Universal Design of Learning

The Universal Design for Learning (UDL) was developed by the Center for Applied Special Technology (CAST) [5]. UDL offers several instructional adaptation options to reach every student regardless of their prerequisite. UDL-based instruction provides multiple ways to present information (“what” of learning), process information and present learning outcomes (“how” of learning), and promote learning engagement and learning motivation (“why” of learning) [5,31]. The three principles are subdivided into nine guidelines, which are described in Table 2. The focus is on the individual so that the barriers to accessibility are minimized. Thus, it is not the learner who must adapt, but the classroom [2,5]. Educators can use UDL principles to create flexible learning pathways for learners to achieve their learning goals. This allows all learners to be addressed by choosing different methods, materials, and assessments based on their individual needs [32]. The UDL principles application does not have to be digital because the educational effort to reach all learners is the focus. However, UDL promises the advantage of reaching the learner in different ways, for example, by reading a text aloud or using videos to convey the learning content [33].

2.3. Universal Design of Assessment

While UDL offers the opportunity to minimize barriers in learning environments, the assessments used for evaluating have barriers again that can significantly influence the results. Capp [7], Edyburn [12,14], and Gregg and Nelson [34] explicitly point out that the assessment should receive more attention when integrating UDL into learning environments. One way to minimize these barriers and increase accessibility is through the Universal Design for Assessment (UDA) framework [35,36,37]. UDA is designed to enable participants to achieve the best possible test scores regardless of personal characteristics which are irrelevant according to the test construct. In doing so, UDA focuses on decreasing the construct irrelevant variance [38]. Similar to UDL, essential elements can also be formulated in UDA (Table 3).

3. Research Question

All in all, our aim was to carry out a quasi-experimental study that investigates the impact of using an extensive and a focused UDL-setting on the development of EBs in science. We therefore designed and compared two learning environments based on a different amount of UDL principles. We also tried to be sensitive to barriers in research in inclusive settings that might affect research results and hinder participation in testing. Thus, we aimed at testing the effect of adapting an internationally published epistemic beliefs questionnaire using the concept of UDA. More concretely, we focused on the following research questions:
  • Does the adaption of UDA on a widely used instrument affect the results of the study?
  • To what extent can epistemic beliefs be fostered in inclusive science classes using the concept of UDL?
  • How does an extensive or a more focused use of UDL principles impact learning outcomes in the field of epistemic beliefs?
This study was part of the dissertation project of one of the authors where further information can be found [39].

4. Materials and Methods

4.1. Description of the Learning Environments

Both learning environments were based on the UDL principles. While one only referred to the principle of multiple representations (“MR environment”) and contained a video, the second learning environment addressed more UDL principles (“UDL-environment”). The extended UDL environment included a comic and interactive pop-up text in addition to the video from the MR learning environment. It contained more features and customizations, as shown in Table 4. The operationalization of the UDL guidelines drew on research findings from test development and evaluation [38,40] and research on digital learning environments [41]. The learning environment was created via iBooks author in e-book format [42] and is described more concretely in an article addressing educators in practice [43]. Furthermore, one operationalization can be contributed to several UDL guidelines.
Both learning environments showed two scientists holding different hypotheses about the question being addressed: Does the same amount of a substance also have the same weight? This question was related to everyday experiences as well as to the concept of density. This fundamental science concept was rather abstract. The learning environment aimed to teach the experiment’s purpose (testing hypotheses) and the experiment’s planning. Learners were given an overview at the beginning of the learning environment with the intended goals they were learning: (1) with experiments, chemists answer their questions, (2) ideas are possible answers to the questions, (3) with experiments, chemists test their ideas, (4) scientists plan an experiment. The learning environment can be seen in Figure 1. However, students in the country and federal state of Lower Saxony, Germany, where this study took place, should be in contact with density while learning about sinking and floating. Using a self-assessment tool, the students started to reflect on how the scientists proceed to figure out whose hypothesis should be accepted. Students then engaged in a hands-on activity using everyday materials. They generated data and reflected on the hypotheses as well as the procedures they and the scientists used to generate knowledge. Beliefs such as the experiments were used to test ideas and those experiments justifying scientific knowledge were fostered. Also, the reflection on data from experiments was used for justification purposes. Thus, the justification of scientific knowledge was the main dimension of the EBs fostered. However, students also had opportunities to reflect upon further EB dimensions: the controversy of science determined by the experiments of the students may also foster beliefs that scientific knowledge is subject to change (development), that the students can test scientific knowledge for themselves and they do not have to rely on authorities (source), as well as knowing that scientific knowledge should be reflected from more than one perspective (certainty).
The learning environments were based on the theoretical framework of easy language [44]. With the selected materials for the experiment, both hypotheses, “equal amount is not necessarily equal weight” and “equal amount is also equal weight,” could be investigated. The following utensils were provided for this purpose: Sand, salt, sugar, measuring cylinders (plastic), scales, and spatulas.

4.2. Preliminary Study

As part of a pre-study, guided interviews were conducted to develop and evaluate the learning environments and the three content representation forms (video, comic, and interactive pop-up text). The accessibility of the learning environment was tested through this approach. The data were analyzed with a qualitative content analysis [45]. The pre-and post-interview lasted 10 min each. Working with the learning environment lasted 30 min.
For the preliminary study, 36 learners from 5th to 7th grade were interviewed in a guideline-based approach. Nine of them indicate a diagnosed need for special educational support. The intervention was carried out in groups of four, while the pre-and post-interviews were conducted individually. Learners were assigned in equal numbers to representational forms: video, text, and comic. The basis for the evaluation was the coding manual of Carey et al. [46]. When intraindividual changes were included, the results show that the video-based representation had advantages over the pop-up text, but not over the comic-based one. In the framework of a correlation analysis, it can be concluded that the video was not superior when the distribution of levels was examined in the learning environment.
Furthermore, through the interviews conducted, an insight into the abilities of the learners could be gained. For this purpose, the interviews were coded with regard to the hypothetical-deductive way of experimenting [47] and unsystematically trying out (look and see) [46].

4.3. Design of the Main Study

A 2 × 2 between-subjects design was selected for the main quantitative study with a pre-post assessment (Table 5). This approach allows differences in learning environments and assessments to be examined. Learners are randomly assigned to one of the experimental groups. This ensured that each study condition was represented in each school class. The intervention lasted 90 min.
The standard assessment of Kampa et al. [24] was used to capture all four EB dimensions. In a further step, this assessment was adapted to create conformity with the UDA. For this purpose, the concept of easy language was utilized [44] and experts (two from German studies, two from special needs education, and two from science education) were consulted to verify the linguistic and content accuracy. The comparability of both assessment forms was secured in this way. Furthermore, a larger text layout and a more everyday response format in the form of stars were chosen (Figure 2). The exemplary wording of the justification scale UDA assessment items can be found in Table 6.
For an extended evaluation, additional learner characteristics were collected via a paper-pencil test and iPad-based tests. The selection of learner characteristics is theory-based and is necessary for a broad understanding of inclusion, as it is not sufficient to focus only on special educational needs. In addition to reading ability and cognitive skills, socioeconomic status, cognitive activation, perception of learning success, as well as gender, age, and diagnosed support needs were assessed (Table 7). We chose these characteristics as they were particularly suitable for describing and quantifying the diversity of the learning groups who participated in this study. We are aware that these characteristics may play a part in categorizing children, contradicting the basic idea of inclusion. However, at least in Germany, characteristics like reading literacy or socioeconomic status show a major impact on school success. Nevertheless, we decided to include these characteristics in our study as the information gained may help advance inclusive teaching.

4.4. Sample

The main study included 348 learners (male = 189; female = 193; mean age 12.2 (SD 0.74)). The learners were from integrated comprehensive schools (IGS) in Lower Saxony, Germany. IGSs stand out in Germany for being the first schools to implement inclusive education. Sixteen learners required special needs education (learning = 12, language = 4), corresponding to a proportion of 4.6% and therefore above average compared to the 3.9% at general education schools in Lower Saxony in the school year 2014/15 [53].

4.5. Procedures of Data Analysis

As a first step, we compared the original with the UDA-test version and tried to figure out a set of items equally existing in both versions to evaluate the development of EBs. For this purpose, we calculated and compared McDonalds-ω as a reliability coefficient [54] and conducted analyses of measurement invariance using longitudinal confirmatory factor analyses (LCFA). We also checked for instructional sensitivity [55] by using a multiple indicator, multiple cause approach (MIMIC approach; as applied, for example, in Sideridis et al. [56]). By introducing a variable representing the type of learning environment as a predictor on the latent factor as well as on the items into the longitudinal model (pre- and post-test), the MIMIC approach is suitable for indicating differences between both test versions in measuring the development of EBs. We also used t-tests on the item level to check for differences between pre- and post-test. Based on these analyses, we identified a comparable set of items for further analyzing the effects of the learning environments.
To gain insights into the UDA-implementation, in a second step we used this set of items to re-check measurement invariance and to compare the accessibility of the assessment versions using a graphical analysis for differential item functioning (DIF). We compared item difficulty for each subgroup by using the learner characteristics data to build up subgroups within the sample. The mean scores for reading literacy, intelligence, and socioeconomic status were calculated. The proportion with special educational needs in the sample, however, was too small for a separate evaluation. A difference of one standard deviation from the mean was chosen as the cut-off criterion for forming groups. Differences in item difficulty for a particular subgroup would indicate differences in test accessibility in regards to an important trait for the diversity of learners [36].
In a third step, we specified a multi-group panel model that included pre- and post-tests as well as the learner characteristics and the type of learning environment (UDL or MR) as covariates. This allows us to model the learning gains in the context of EBs, the impacts of the learner characteristics, and the impact of the type of learning environment as well in one step. If the covariate learning environment indicates a significant correlation to the EB measures this would be an indicator for an outperformance of the UDL environment (as the UDL environment was coded with one). In order to acknowledge the students’ individual learning gains in this quantitative setting, longitudinal plots were calculated. These plots give one line for each measurement point of a student showing the whole range of sample as well as a medium line indicating the mean learning gain of the whole sample.
As the learning environments mainly focused on fostering the justification dimension, we will mainly present the results for this dimension. This will then be discussed to provide consistent and structured insight into the presentation of results. All further results will be provided in Appendix A.

5. Results

5.1. Step One: Item Selection

In the very first analysis, the items of the scales were analyzed and evaluated in the process described above. The analyses showed that items with a low factor loading are those that do not show a significant mean change in either assessment form (Table 8). Items with a sufficiently high standardized factor loading were selected for the scales’ new formation and items with significant mean changes despite a low standardized factor loading were also included. Consequently, items 2, 3, 4, and 6 of the justification scale were relevant for further analysis and were converted into a short scale. The results of the selection process are documented in Appendix A. The other three scales were analyzed accordingly, and the number of items was reduced in the form of a short scale.
The internal consistency of the short scales showed an acceptable to good range. The exception was the justification scale in the original assessment. An increase in the consistencies from the first to the second measurement point can be seen (Table 9).

5.2. Step Two: Checking Test Accessibility of Both Versions

Longitudinal measurement invariance (MI) testing of the justification short scale showed that the data supported configural, metric (∆CFI = −0.007; ∆χ2 = 5.87, p = n.s.), and full scalar MI (∆CFI = 0.002; ∆χ2 = 8.55, p = n.s.). The quality criterion for strict MI was not met (∆CFI = 0.013), but the χ2-difference test established a significant difference for the model and data structures in contrast (∆χ2 = 12.73, p = n.s.). The remaining fit indices were in the good to very good range. Consequently, it can be assumed that the data supported strict MI (Table 10).
For the test accessibility, the data from the short scales were used at the first measurement point, since at the second measurement point, there was already an influence of the learning environment. The learner characteristics reading literacy, intelligence, gender, and socioeconomic status were used to examine the items’ group dependency. We chose a standard deviation from the mean as a cut-off parameter to form groups for the analysis.
Regarding the statistical parameters, no significant differences can be observed for the item difficulty of both assessment versions (reading literacy: t(328) = 1.65, p = n.s.; intelligence: t(337) = 1.34, p = n.s.; socioeconomic status: t(332) = 0.21, p = n.s.). Using these criteria, a total of 137 at-risk learners can be identified who meet at least one criterion. This corresponded to 40% of the total sample.
The items of the justification scale of the UDA assessment showed measurement invariance in all four group comparisons (Figure 3). The original assessment showed a similar situation except for item 2. There, students with higher intelligence were more successful than those with lower intelligence (Figure 4). Besides, a lower socioeconomic status led to a higher solution probability. Overall, the UDA assessment was minimally more accessible than the original assessment at the first measurement point regarding the justification scale.

5.3. Step Three: Checking on Learning Gains and Differences between UDL and MR Environment

Figure 5 shows the multi-group panel model of interindividual changes in the justification scale. It takes into account the learner characteristics as well as the impact of the learning environment for the UDA and the original test version. Only significant paths are shown. In the EB dimension justification, the mean increases significantly (measurement point 1: UDA assessment M = 3.44, original assessment M = 3.68; measurement point 2: UDA assessment M = 3.67, original assessment M = 3.75) (Figure 5). At the first measurement time point, the regressions on intelligence and reading literacy on the latent construct are significant. The regressions of the constructs on each other are the largest of all comparisons. Students with special educational needs at the second measurement point show lower scores on the original assessment than those without special educational needs. Most importantly for research question three, no significant impact on the learning environment can be established. Students learning with the UDL environment did not outperform students learning in the MR environment. The learning gains measured by the original test version were not as high as those measured by the UDA version.
Furthermore, the longitudinal plots of the justification scale are shown below. In both assessment variants, the average increases from the first to the second measurement time point (UDA assessment: y = 0.229 × x + 3.376; M1 = 3.61, M2 = 3.83; original assessment: y = 0.076 × x + 3.61; M1 = 3.69, M2 = 3.76) (Figure 6). The UDA version indicated a comparable increase in EB development whereas the original version showed increases as well as decreases.

6. Discussion

Summarizing the Results and Answering the Research

Using a quasi-experimental study, we investigated the impact of using an extensive and a focused UDL-setting on the development of epistemic beliefs in science. We used a 2 × 2-between-subject design to examine the impact of adapting an EB questionnaire for researching in inclusive settings.
Regarding the first research questions, our results show that the UDA version has more adequately tested statistic values. The UDA assessment has a higher overall inter-item correlation than the original one. Furthermore, the internal consistency of both assessment variants increases towards the second measurement point. Yet with the UDA assessment, a higher consistency can already be assumed at the first measurement point due to McDonalds-w. However, comparing the learning gains, the UDA-based version indicates increased acceptance of sophisticated views on the justification of scientific knowledge, whereas the original version indicates an increased variance with a comparable stable mean. Students showed an increased as well as decreased acceptance of sophisticated views.
We assume that this effect is due to test barriers in the original questionnaire. We think that students working with the UDA version understand the items better in the first measurement point. Some students with the original version might need the learning environment to elaborate on their understanding. They might answer the original version in what they deem a purposeful manner in the second measurement point. This might lead to a decreased acceptance of sophisticated beliefs so the original version might not show all in all the elaboration of beliefs. To follow this finding, qualitative studies involving cognitive interviews such as proposed by Kuusela and Paul [57] or Ryan, Gannon-Slater, and Culbertson [58] might be fruitful. They would have to reflect, however, the diversity of students in inclusive learning settings.
Regarding the second and the third research questions, the UDA version of the questionnaire indicates an elaboration of students’ views on the justification of scientific knowledge. However, the multi-group panel models do not significantly impact the variable “learning environment.” This means that we could not detect students learning in the extensive UDL environment who outperformed those who learned in the MR environment.
We discuss these findings with regard to four implications:
Implication no. 1: In inclusive settings where quantitative research is conducted, test accommodation plays a significant role. Quantitative instruments should be used with care.
Aiming at a barrier minimized learning environment is undoubtedly a good step toward enabling all students to participate. For conducting research, barriers can be set up again, which can disadvantage particular students and lead to biased research results. Adjustments such as extending the processing time or reducing the number of items do not seem appropriate [59,60,61]. However, the principles of the UDA allow the barriers to be minimized without changing the target construct. If researchers minimized barriers in the assessment, it is important not to change the actual target construct to avoid unsystematic scoring patterns [62]. Within the framework of UDA, further adjustments are also possible and also beneficial. Zydney, Hord, and Koenig [63] show that video-based assessments for students with learning disabilities may be an excellent way to minimize barriers. Furthermore, there is a need to investigate accessibility through auditory representations [64].
Future projects in UDL-oriented research that contain quantitative approaches might benefit from adding qualitative research on the assessment, the processes of working with the assessment tool, and its possible barriers.
Implication no. 2: The UDL principles should be applied with care. “The more, the better” does not seem to be applicable.
This study could not detect significant advantages of the extensive UDL learning environment. Of course, non-significant findings might be explained by methodological effects such as too much error variance in the data. The reliability and DIF analyses, however, indicate a relatively acceptable amount of noise in the data. The effect of applying more UDL principles does not seem strong enough to hold its ground against the remaining data noise.
It is more likely that using a video as a tool containing multiple representations might be enough to decrease barriers for elaborating EBs. This was also shown in the preliminary study where interviews indicated that the embedded video already had advantages over the other representations. Since both learning environments use the video, the advantages of the UDL over the MR learning environment may be leveled out.
Implication no. 3: The UDL principles should be introduced with care. The more, the better might not be applicable in the long run. UDL also means changing a learning culture.
As this study was carried out with students in inclusive schools who did not work with UDL, the UDL learning environment might have been too complex to outperform the MR framework in the first place. We do not have any data on when students become familiar with learning with UDL environments. Since UDL is fundamentally different from monomodal teaching, its integration into the school routine may need to be ritualized over a longer period to unleash the full potential.
Implication no. 4: An unanswered question is how students’ learning behavior in a UDL learning environment leads to an increased outcome for all students. Learning analytics could fill this gap in research.
The learning environment was technically realized with an eBook app. When the study was carried out, it was not possible to track the students’ learning progress. Qualitative research might be one way to gain more insights into the learning processes. Against the multitude of students’ characteristics, future research may be able to draw on technological advances in learning analytics and machine learning in the sense of collecting and analyzing page view times, general usage of the eBook contents, or clickstreams. This makes it possible to “intelligently” process vast amounts of data beyond human capability. Thus, patterns can be detected, learning paths can be recorded, and extensive analysis can be performed. One challenge that future research will face is balancing the individuality of students’ learning and the categories that learning analytics and machine learning systems would use to make sense of students’ learning. Currently, there are already existing systems that can track the learning path of students with machine learning. With the help of log files, it is possible to identify students’ behavior regarding “gaming” the system [65] or the potential to identify student modeling practices more extensively in a way that has not been possible before thanks to machine learning [66]. Nevertheless, the need and benefit for systems that use machine learning are also evident concerning UDA. Through an “intelligent” system, future systems could adapt the assessment individually to the student [63].
All in all, this study might be a confirmatory approach to the UDL literature that focuses on an important research gap [6,7,13]. Our results might contribute to raising even more questions than we can answer in one study. Therefore, proposition #10 stated by Edyburn in 2010 [9] (p. 40) “UDL Is Much More Complex Than We Originally Thought” still seems applicable.

Author Contributions

Conceptualization, M.R., M.W. and A.N.; formal analysis, M.W.; investigation, M.W.; writing—original draft preparation, M.R.; writing—review and editing, M.W. and A.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

For further data, please contact the authors.

Acknowledgments

A special thanks belong to the project “Didaktische Forschung”.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A.

Appendix A.1. Process of Item Selection at the Justification Scale

Table A1. The Internal consistency of the justification scale before we shortened the item set [39].
Table A1. The Internal consistency of the justification scale before we shortened the item set [39].
EBs-ScaleMeasurement Point 1Measurement Point 2Both Measurement Point
UDA Assessment
Source0.590.830.76
Certainty0.810.870.87
Development0.860.890.9
Justification0.830.840.87
Original Assessment
Source0.70.760.79
Certainty0.780.850.87
Development0.830.90.89
Justification0.520.770.74
Table A2. Measurement invariance of the justification scale before we shortened the item set [39].
Table A2. Measurement invariance of the justification scale before we shortened the item set [39].
Fit Values
StageChi-SquaredFpRMSEA CFITLISRMRAccepted?
Configural206.6138<0.05 0.0650.9290.9060.061Yes
Metric227.48156<0.050.0620.9260.9140.077Yes
Scalar274.02174<0.050.070.8960.8920.083No
Partial scalar259.89172<0.050.0660.9090.9040.083Yes
Strict285.33190<0.050.0650.9010.9050.086Yes
Explanatory note: dF: degrees of freedom; CFI: Comparative-Fit-Index; RMSEA: Root-Mean-Square-Error of Approximation; TLI: Trucker-Lewis-Index; SRMR: Standardized Root Mean Square Residual; Partial scalar: Restriction for the mean values of item 3 removed.
Table A3. Regressions in the MIMIC approach to the justification scale. Shown are the regressions of the learning environment on the latent factor and manifest indicators in a configural measurement invariance model [39].
Table A3. Regressions in the MIMIC approach to the justification scale. Shown are the regressions of the learning environment on the latent factor and manifest indicators in a configural measurement invariance model [39].
Original AssessmentUDA Assessment
ItemsEstimatep-ValueEstimatep-Value
Latent factor−0.050.56−0.160.09
Item 10.190.30−0.210.21
Item 20.110.480.000.97
Item 30.070.630.000.97
Item 40.100.400.030.82
Item 5−0.050.660.180.19
Item 6−0.200.070.040.77
Item 7−0.120.34−0.140.26
Explanatory note: CFI: 0.924; TLI: 0.892; RMSEA: 0.00; SRMR: 0.062.
Table A4. Significant mean change of items in both assessments to the justification scale (Bonferroni correction performed) [39].
Table A4. Significant mean change of items in both assessments to the justification scale (Bonferroni correction performed) [39].
Original AssessmentUDA Assessment
ItemsMP 2-MP 1pMP 2-MP 1p
Item 10.45<0.05−0.021.000
Item 20.121.000−0.240.378
Item 30.171.0000.001.000
Item 40.44<0.050.181.000
Item 50.220.2970.041.000
Item 6−0.091.0000.011.000
Item 70.171.0000.101.000
Explanatory note: MP: Measurement point.

Appendix A.2. Results of the Source, Certainity and Development Scales

Table A5. Measurement invariance of the source scale before we shortened the item set [39].
Table A5. Measurement invariance of the source scale before we shortened the item set [39].
Fit Values
StageChi-SquaredFpRMSEACFITLISRMRAccepted?
Configural103.4358<0.050.080.9290.8890.057Yes
Metric119.1370<0.050.0760.9230.9010.07Yes
Scalar180.0182<0.050.0990.8460.8310.089No
Partial scalar131.3979<0.050.0740.9180.9060.072Yes
Strict169.7888<0.050.0870.8720.8690.086No
Explanatory note: dF: degrees of freedom; CFI: Comparative-Fit-Index; RMSEA: Root-Mean-Square-Error of Approximation; TLI: Trucker-Lewis-Index; SRMR. Standardized Root Mean Square Residual; Partial scalar: Restriction for the mean values of item 3 removed.
Table A6. Regressions in the MIMIC approach to the source scale. Shown are the regressions of the learning environment on the latent factor and manifest indicators in a configural measurement invariance model [39].
Table A6. Regressions in the MIMIC approach to the source scale. Shown are the regressions of the learning environment on the latent factor and manifest indicators in a configural measurement invariance model [39].
Original AssessmentUDA Assessment
ItemsEstimatep-ValueEstimatep-Value
Latent factor−0.170.070.150.26
Item 1−0.210.270.270.12
Item 2−0.030.850.090.49
Item 30.050.68−0.140.36
Item 40.050.700.100.42
Item 50.190.15−0.120.44
Explanatory note: CFI: 0.911; TLI: 0.861; RMSEA: 0.075; SRMR: 0.067.
Table A7. Measurement invariance of the certainty scale before we shortened the item set [39].
Table A7. Measurement invariance of the certainty scale before we shortened the item set [39].
Fit Values
StageChi-SquaredFpRMSEACFITLISRMRAccepted?
Configural253.86138<0.050.0830.9040.8730.064Yes
Metric275.62156<0.050.0790.9010.8840.073Yes
Scalar412.36174<0.050.1060.8030.7930.103No
Partial scalar328.45165<0.050.090.8650.8510.086No
Strict348.82181<0.050.0870.8610.860.089No
Explanatory note: dF: degrees of freedom; CFI: Comparative-Fit-Index; RMSEA: Root-Mean-Square-Error of Approximation; TLI: Trucker-Lewis-Index; SRMR. Standardized Root Mean Square Residual; Partial scalar: Restriction for the mean values of item 3 removed.
Table A8. Regressions in the MIMIC approach to the certainity scale. Shown are the regressions of the learning environment on the latent factor and manifest indicators in a configural measurement invariance model [39].
Table A8. Regressions in the MIMIC approach to the certainity scale. Shown are the regressions of the learning environment on the latent factor and manifest indicators in a configural measurement invariance model [39].
Original AssessmentUDA Assessment
ItemsEstimatep-ValueEstimatep-Value
Latent factor0.020.890.200.10
Item 1−0.010.960.200.21
Item 20.060.690.390.01
Item 30.070.630.230.10
Item 40.040.79−0.210.18
Item 50.050.74−0.020.87
Item 60.230.08−0.410.00
Item 7−0.160.280.180.23
Explanatory note: CFI: 0.902; TLI: 0.861; RMSEA: 0.00; SRMR: 0.063.
Table A9. Measurement invariance of the development scale before we shortened the item set [39].
Table A9. Measurement invariance of the development scale before we shortened the item set [39].
Fit Values
StageChi-SquaredFpRMSEA CFITLISRMRAccepted?
Configural343.39190<0.050.0850.9030.8780.065Yes
Metric360.14211<0.050.080.9060.8930.072Yes
Scalar452.79232<0.050.0930.8610.8560.081No
Partial scalar391.18223<0.050.0820.8940.8860.077No
Strict508.56241<0.050.10.8310.8320.09No
Explanatory note: dF: degrees of freedom; CFI: Comparative-Fit-Index; RMSEA: Root-Mean-Square-Error of Approximation; TLI: Trucker-Lewis-Index; SRMR. Standardized Root Mean Square Residual; Partial scalar: Constraint for the factor loadings for Items 1 and 6 at both measurement time points 1 and 2.
Table A10. Regressions in the MIMIC approach to the development scale. Shown are the regressions of the learning environment on the latent factor and manifest indicators in a configural measurement invariance model [39].
Table A10. Regressions in the MIMIC approach to the development scale. Shown are the regressions of the learning environment on the latent factor and manifest indicators in a configural measurement invariance model [39].
Original AssessmentUDA Assessment
ItemsEstimatep-ValueEstimatep-Value
Latent factor−0.090.450.110.34
Item 1−0.190.86−0.030.86
Item 20.060.65−0.050.73
Item 30.030.750.090.57
Item 40.060.59-0.080.55
Item 5−0.020.90−0.110.39
Item 6−0.030.820.010.97
Item 70.010.910.220.17
Item 80.080.450.110.52
Explanatory note: CFI: 0.894; TLI: 0.859; RMSEA: 0.051; SRMR: 0.068.
Table A11. Significant mean change of items in both assessments to the source, certainty and development scales (Bonferroni correction performed) [39].
Table A11. Significant mean change of items in both assessments to the source, certainty and development scales (Bonferroni correction performed) [39].
Original AssessmentUDA Assessment
ItemsMP 2-MP 1pMP 2-MP 1p
Source scale
Item 1−0.191.0000.061.000
Item 2−0.350.108−0.041.000
Item 30.61<0.050.35<0.05
Item 4−0.171.000−0.240.378
Item 5−0.54<0.05−0.031.000
Certainty scale
Item 1−0.111.0000.071.000
Item 2−0.38<0.05−0.111.000
Item 3−0.111.000−0.011.000
Item 4−0.39<0.05−0.101.000
Item 5−0.34<0.05−0.161.000
Item 6−0.75<0.05−0.061.000
Item 7−0.49<0.05−0.161.000
Development scale
Item 10.49<0.050.161.000
Item 20.47<0.050.260.135
Item 30.46<0.05−0.111.000
Item 40.111.0000.081.000
Item 50.260.2430.190.729
Item 60.270.1890.071.000
Item 7−0.211.000−0.220.513
Item 8−0.121.000−0.121.000
Table A12. Reformulated source, certainty and development short scales with the standard factor loadings, mean differences and associated Bonferroni-corrected significances [39].
Table A12. Reformulated source, certainty and development short scales with the standard factor loadings, mean differences and associated Bonferroni-corrected significances [39].
Original AssessmentUDA Assessment
Standardized Factor LoadingsMean Values Standardized Factor LoadingsMean Values
MP 1MP 2MP 2-MP1pMP 1MP 2MP 2-MP1p
Source scale
Item 20.500.66−0.040.710.270.77−0.350.00
Item 30.370.710.350.000.540.650.610.00
Item 40.780.78−0.240.010.600.79−0.170.16
Item 50.690.63−0.030.740.520.69−0.540.00
Certainty scale
Item 30.660.70−0.010.900.650.75−0.110.29
Item 40.550.70−0.100.330.820.73−0.390.00
Item 60.630.63−0.060.500.640.71−0.750.00
Item 70.660.62−0.160.100.620.73−0.490.00
Development scale
Item 10.560.660.160.100.720.700.120.26
Item 20.640.710.260.000.520.600.170.08
Item 40.650.760.080.410.630.690.440.00
Item 50.670.650.190.030.660.68−0.090.36
Explanatory note: MP: measurement point.
Table A13. Measurement invariance models for EB short scale source [39].
Table A13. Measurement invariance models for EB short scale source [39].
Fit Values
StageChi-SquaredFpRMSEACFITLISRMRAccepted?
Configural37.35300.1670.0450.9850.9720.041Yes
Metric53.18390.0650.0540.9710.9580.068Yes
Scalar114.848<0.050.1060.8630.840.094No
Partial scalar64.1546<0.050.0570.9630.9550.074Yes
Strict136.9658<0.050.1050.8380.8440.096Yes
Explanatory note: dF: degrees of freedom; CFI: Comparative-Fit-Index; RMSEA: Root-Mean-Square-Error of Approximation; TLI: Trucker-Lewis-Index; SRMR. Standardized Root Mean Square Residual; Partial scalar: Restriction for the mean values of item 3 removed; Measurement time canceled.
Table A14. Measurement invariance models for EB short scale certainty [39].
Table A14. Measurement invariance models for EB short scale certainty [39].
Fit Values
StageChi-SquaredFpRMSEA CFITLISRMRAccepted?
Configural55.4630<0.050.0830.9570.920.049Yes
Metric71.5339<0.050.0820.9450.9210.068Yes
Scalar119.148<0.050.1090.880.860.098No
Partial scalar76.6944<0.050.0770.9450.930.071Yes
Strict87.1856<0.050.0670.9480.9480.067Yes
Explanatory note: dF: degrees of freedom; CFI: Comparative-Fit-Index; RMSEA: Root-Mean-Square-Error of Approximation; TLI: Trucker-Lewis-Index; SRMR. Standardized Root Mean Square Residual; Partial scalar: Restriction for the mean values of item 3 and 6 removed; Measurement time canceled.
Table A15. Measurement invariance models for EB short scale development [39].
Table A15. Measurement invariance models for EB short scale development [39].
Fit Values
StageChi-SquaredFpRMSEA CFITLISRMRAccepted?
Configural41.31300.0820.0530.9840.9710.034Yes
Metric48.93390.1320.0430.9860.980.046Yes
Scalar69.9548<0.050.0580.9690.9640.057No
Partial scalar63.1946<0.050.0520.9760.9710.054Yes
Strict87.1856<0.050.0670.9480.9480.067No
Explanatory note: dF: degrees of freedom; CFI: Comparative-Fit-Index; RMSEA: Root-Mean-Square-Error of Approximation; TLI: Trucker-Lewis-Index; SRMR. Standardized Root Mean Square Residual; Partial scalar: Restriction for factor loading of item 2 at the first measurement time point.
Figure A1. DIF analysis of the source scale from the UDA assessment concerning (1) gender, (2) reading literacy, (3) socioeconomic status, and (4) intelligence [39].
Figure A1. DIF analysis of the source scale from the UDA assessment concerning (1) gender, (2) reading literacy, (3) socioeconomic status, and (4) intelligence [39].
Education 11 00164 g0a1
Figure A2. DIF analysis of the source scale from the original assessment concerning (1) gender, (2) reading literacy, (3) socioeconomic status, and (4) intelligence [39].
Figure A2. DIF analysis of the source scale from the original assessment concerning (1) gender, (2) reading literacy, (3) socioeconomic status, and (4) intelligence [39].
Education 11 00164 g0a2
Figure A3. DIF analysis of the certainty scale from the UDA assessment concerning (1) gender, (2) reading literacy, (3) socioeconomic status, and (4) intelligence [39].
Figure A3. DIF analysis of the certainty scale from the UDA assessment concerning (1) gender, (2) reading literacy, (3) socioeconomic status, and (4) intelligence [39].
Education 11 00164 g0a3
Figure A4. DIF analysis of the certainty scale from the original assessment concerning (1) gender, (2) reading literacy, (3) socioeconomic status, and (4) intelligence [39].
Figure A4. DIF analysis of the certainty scale from the original assessment concerning (1) gender, (2) reading literacy, (3) socioeconomic status, and (4) intelligence [39].
Education 11 00164 g0a4
Figure A5. DIF analysis of the development scale from the UDA assessment concerning (1) gender, (2) reading literacy, (3) socioeconomic status, and (4) intelligence [39].
Figure A5. DIF analysis of the development scale from the UDA assessment concerning (1) gender, (2) reading literacy, (3) socioeconomic status, and (4) intelligence [39].
Education 11 00164 g0a5
Figure A6. DIF analysis of the development scale from the original assessment concerning (1) gender, (2) reading literacy, (3) socioeconomic status, and (4) intelligence [39].
Figure A6. DIF analysis of the development scale from the original assessment concerning (1) gender, (2) reading literacy, (3) socioeconomic status, and (4) intelligence [39].
Education 11 00164 g0a6
Figure A7. Simplified multi-group panel model of interindividual changes in the source scale taking into account learner characteristics. Only significant paths are shown [39].
Figure A7. Simplified multi-group panel model of interindividual changes in the source scale taking into account learner characteristics. Only significant paths are shown [39].
Education 11 00164 g0a7
Figure A8. Simplified multi-group panel model of interindividual changes in the certainty scale taking into account learner characteristics. Only significant paths are shown [39].
Figure A8. Simplified multi-group panel model of interindividual changes in the certainty scale taking into account learner characteristics. Only significant paths are shown [39].
Education 11 00164 g0a8
Figure A9. Simplified multi-group panel model of interindividual changes in the development scale taking into account learner characteristics. Only significant paths are shown [39].
Figure A9. Simplified multi-group panel model of interindividual changes in the development scale taking into account learner characteristics. Only significant paths are shown [39].
Education 11 00164 g0a9
Figure A10. Longitudinal plots to trajectories of all students to the source, certainty and development scales from both assessments (UDA assessment n = 175; original assessment= 165). The average is shown in black [39].
Figure A10. Longitudinal plots to trajectories of all students to the source, certainty and development scales from both assessments (UDA assessment n = 175; original assessment= 165). The average is shown in black [39].
Education 11 00164 g0a10

References

  1. Stinken-Rösner, L.; Rott, L.; Hundertmark, S.; Menthe, J.; Hoffmann, T.; Nehring, A.; Abels, S. Thinking Inclusive Science Education from Two Perspectives: Inclusive Pedagogy and Science Education. Res. Subj. Matter Teach. Learn. 2020, 3, 30–45. [Google Scholar]
  2. Brownell, M.T.; Smith, S.J.; Crockett, J.B.; Griffin, C.C. Inclusive Instruction EvidenceBased Practices for Teaching Students with Disabilities; The Guilford Press: New York, NY, USA, 2012. [Google Scholar]
  3. Sliwka, A. Diversität als Chance und als Ressource in der Gestaltung wirksamer Lernprozesse. In Das Interkulturelle Lehrerzimmer; VS Verlag für Sozialwissenschaften: Wiesbaden, Germany, 2012. [Google Scholar]
  4. UNESCO. Education 2030: Incheon Declaration and Framework for Action for the Implementation of Sustainable Development Goal 4: Ensure Inclusive and Equitable Quality Education and Promote Lifelong Learning Opportunities for All; UNESCO: Paris, France, 2016. [Google Scholar]
  5. CAST. Universal Design for Learning (UDL) Guidelines Version 2.2. 2018. Available online: https://udlguidelines.cast.org/ (accessed on 22 January 2021).
  6. Al-Azawei, A.; Serenelli, F.; Lundqvist, K. Universal Design for Learning (UDL): A Content Analysis of Peer Reviewed Journals from 2012 to 2015. J. Sch. Teach. Learn. 2016, 16, 39–56. [Google Scholar] [CrossRef][Green Version]
  7. Capp, M.J. The effectiveness of universal design for learning: A meta-analysis of literature between 2013 and 2016. Int. J. Incl. Educ. 2017, 21, 791–807. [Google Scholar] [CrossRef]
  8. García-Campos, M.D.; Canabal, C.; Alba-Pastor, C. Executive functions in universal design for learning: Moving towards inclusive education. Int. J. Incl. Educ. 2018, 24, 660–674. [Google Scholar] [CrossRef]
  9. Rao, K.; Ok, M.W.; Bryant, B.R. A Review of Research on Universal Design Educational Models. Remedial Spéc. Educ. 2013, 35, 153–166. [Google Scholar] [CrossRef]
  10. Baumann, T.; Melle, I. Evaluation of a digital UDL-based learning environment in inclusive chemistry education. Chem. Teach. Int. 2019, 1, 1–13. [Google Scholar] [CrossRef][Green Version]
  11. Basham, J.D.; Blackorby, J.; Marino, M.T. Opportunity in Crisis: The Role of Universal Design for Learning in Educational Redesign. Learn. Disabil. Contemp. J. 2020, 18, 71–91. [Google Scholar]
  12. Edyburn, D.L. Would You Recognize Universal Design for Learning if You Saw it? Ten Propositions for New Directions for the Second Decade of UDL. Learn. Disabil. Q. 2010, 33, 33–41. [Google Scholar] [CrossRef][Green Version]
  13. Hollingshead, A.; Lowrey, K.A.; Howery, K. Universal Design for Learning: When Policy Changes Before Evidence. Educ. Policy 2020, 1–27. [Google Scholar] [CrossRef]
  14. Edyburn, D. Ten Years Later: Would You Recognize Universal Design for Learning If You Saw It? Interv. Sch. Clin. 2020, 1–2. [Google Scholar] [CrossRef]
  15. Murphy, M.P. Belief without evidence? A policy research note on Universal Design for Learning. Policy Futur. Educ. 2021, 19, 7–12. [Google Scholar] [CrossRef]
  16. Bybee, R.W. Scientific Inquiry and Science Teaching BT: Scientific Inquiry and Nature of Science: Implications for Teaching, Learning, and Teacher Education. In Scientific Inquiry and Nature of Science; Flick, L.B., Lederman, N.G., Eds.; Springer: Dordrecht, The Netherlands, 2006; pp. 1–14. [Google Scholar]
  17. Hodson, D. Learning Science, learning about Science, Doing Science: Different goals demand different learning methods. Int. J. Sci. Educ. 2014, 36, 2534–2553. [Google Scholar] [CrossRef]
  18. Hofer, B.K.; Pintrich, P.R. The Development of Epistemological Theories: Beliefs About Knowledge and Knowing and Their Relation to Learning. Rev. Educ. Res. 1997, 67, 88–140. [Google Scholar] [CrossRef]
  19. Neumann, I.; Kremer, K. Nature of Science Und Epistemologische Überzeugungen: Ähnlichkeiten Und Unterschiede. Ger. J. Sci. Educ. 2013, 19, 209–232. [Google Scholar]
  20. Sandoval, W.A.; Greene, J.A.; Bråten, I. Understanding and Promoting Thinking About Knowledge. Rev. Res. Educ. 2016, 40, 457–496. [Google Scholar] [CrossRef]
  21. Hofer, B.K. Personal epistemology as a psychological and educational construct: An introduction. In Personal Epistemology: The Psychology of Beliefs about Knowledge and Knowing; Routledge: New York, NY, USA, 2002; pp. 3–14. [Google Scholar]
  22. Chinn, C.A.; Buckland, L.A.; Samarapungavan, A. Expanding the Dimensions of Epistemic Cognition: Arguments from Philosophy and Psychology. Educ. Psychol. 2011, 46, 141–167. [Google Scholar] [CrossRef]
  23. Conley, A.M.; Pintrich, P.R.; Vekiri, I.; Harrison, D. Changes in epistemological beliefs in elementary science students. Contemp. Educ. Psychol. 2004, 29, 186–204. [Google Scholar] [CrossRef]
  24. Kampa, N.; Neumann, I.; Heitmann, P.; Kremer, K. Epistemological beliefs in science—a person-centered approach to investigate high school students’ profiles. Contemp. Educ. Psychol. 2016, 46, 81–93. [Google Scholar] [CrossRef][Green Version]
  25. Mason, L. Psychological perspectives on measuring epistemic cognition. In Handbook of Epistemic Cognition; Routledge: Oxfordshire, UK, 2016. [Google Scholar]
  26. Elder, A.D. Characterizing fifth grade students’ epistemological beliefs in science. In Personal Epistemology: The Psychology of Beliefs about Knowledge and Knowing; Hofer, B.K., Pintrich, P.R., Eds.; Lawrence Erlbaum Associates Publishers: Mahwah, NJ, USA, 2002; pp. 347–363. [Google Scholar]
  27. Greene, J.A.; Cartiff, B.M.; Duke, R.F. A meta-analytic review of the relationship between epistemic cognition and academic achievement. J. Educ. Psychol. 2018, 110, 1084–1111. [Google Scholar] [CrossRef]
  28. Cartiff, B.M.; Duke, R.F.; Greene, J.A. The effect of epistemic cognition interventions on academic achievement: A meta-analysis. J. Educ. Psychol. 2020. [Google Scholar] [CrossRef]
  29. Bybee, R.W. Achieving Scientific Literacy: From Purposes to Practices; Heinemann: Portsmouth, NH, USA, 1997. [Google Scholar]
  30. Brass, J. Historicizing the Role of Education Research in Reconstructing English for the Twenty-first Century. Chang. Engl. 2009, 16, 275–286. [Google Scholar] [CrossRef]
  31. Schlüter, A.K.; Melle, I.; Wember, F. Unterrichtsgestaltung in Klassen Des Gemeinsamen Lernens: Universal Design for Learning. Sonderpädag. Förd. 2016, 3, 270–285. [Google Scholar] [CrossRef]
  32. Rao, K.; Meo, G. Using Universal Design for Learning to Design Standards-Based Lessons. SAGE Open 2016, 6, 1–12. [Google Scholar] [CrossRef]
  33. King-Sears, M. Universal Design for Learning: Technology and Pedagogy. Learn. Disabil. Q. 2009, 32, 199–201. [Google Scholar] [CrossRef]
  34. Gregg, N.; Nelson, J.M. Meta-analysis on the Effectiveness of Extra time as a Test Accommodation for Transitioning Adolescents with Learning Disabilities. J. Learn. Disabil. 2010, 45, 128–138. [Google Scholar] [CrossRef] [PubMed]
  35. Beddow, P. Beyond Universal Design: Accessibility Theory to Advance Testing for All Students. In Assessing Students in the Margin: Challenges, Strategies and Techniques; Information Age Publishing: Charlotte, NC, USA, 2011. [Google Scholar]
  36. Lovett, B.J.; Lewandowski, L.J. Testing Accommodations for Students with Disabilities; American Psychological Association: Washington, DC, USA, 2015. [Google Scholar]
  37. Thompson, S.; Thurlow, M.; Malouf, D.B. Creating Better Tests for Everyone through Universally Designed Assessments. J. Appl. Test. Technol. 2004, 6, 1–15. [Google Scholar]
  38. Thompson, S.J.; Johnstone, C.J.; Thurlow, M.L. Universal Design Applied to Large Scale Assessments; National Center on Educational Outcomes: Minneapolis, MN, USA, 2002. [Google Scholar]
  39. Walkowiak, M. Konzeption Und Evaluation von Universell Designten Lernumgebungen Und Assessments Zur Förderung Und Erfassung von Nature of Science Konzepten, Gottfried Wilhelm Leibniz Universität. 2019. Available online: https://www.repo.uni-hannover.de/handle/123456789/5192 (accessed on 22 January 2021).
  40. Salvia, J.; Ysseldyke, J.; Witmer, S. What Test Scores Mean. In Assessment in Special and Inclusive Education; Cengage Learning, Inc.: Boston, MA, USA, 2016. [Google Scholar]
  41. Clark, R.C.; Mayer, R.E. Introduction: Getting the Most from this Resource. In e-Learning and the Science of Instruction; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2012. [Google Scholar]
  42. Apple Inc. IBooks Author: Per Drag & Drop Ist Das Buch Schnell Erstellt. Available online: https://www.apple.com/de/ibooks-author/ (accessed on 15 November 2017).
  43. Nehring, A.; Walkowiak, M. Digitale Materialien Nach Dem Universial Design for Learning. Schule Inklusiv 2020, 8, 28–32. [Google Scholar]
  44. Inclusion Europe. Information for Everyone: European Rules on How to Make Information Easy to Read and Understand; Inclusion Europe: Brussels, Belgium, 2016. [Google Scholar]
  45. Mayring, P. Einführung in Die Qualitative Sozialforschung; Beltz Verlag: Weinheim, Germany, 2016. [Google Scholar]
  46. Carey, S.; Evans, R.; Honda, M.; Jay, E.; Unger, C. An experiment is when you try it and see if it works: A study of grade 7 students’ understanding of the construction of scientific knowledge. Int. J. Sci. Educ. 1989, 11, 514–529. [Google Scholar] [CrossRef]
  47. Labudde, P. Fachdidkatik Naturwissenschaft. 1–9. Schuljahr; UTB: Stuttgart, Germany, 2010. [Google Scholar]
  48. Mayring, H.; Wimmer, H. Salzburger Lese-Screening Für Die Schulstufen 2–9; Hogrefe: Gottingen, Germany, 2014. [Google Scholar]
  49. Heller, K.; Perleth, C. Kognitiver Fähigkeitstest Für 4. Bis 12. Klassen, Revision; Beltz & Gelberg: Weinheim, Germany, 2000. [Google Scholar]
  50. Torsheim, T.; the FAS Development Study Group; Cavallo, F.; Levin, K.A.; Schnohr, C.; Mazur, J.; Niclasen, B.; Currie, C.E. Psychometric Validation of the Revised Family Affluence Scale: A Latent Variable Approach. Child Indic. Res. 2016, 9, 771–784. [Google Scholar] [CrossRef][Green Version]
  51. Fauth, B.; Decristan, J.; Rieser, S.; Klieme, E.; Büttner, G. Student ratings of teaching quality in primary school: Dimensions and prediction of student outcomes. Learn. Instr. 2014, 29, 1–9. [Google Scholar] [CrossRef]
  52. Sprague, E.W.; Dahl, D.W. Learning to Click. J. Mark. Educ. 2009, 32, 93–103. [Google Scholar] [CrossRef][Green Version]
  53. Werning, R.; Thoms, S. Anmerkungen Zur Entwicklung Der Schulischen Inklusion in Niedersachsen. Z. Inkl. 2017, 2, 1–4. [Google Scholar]
  54. Lucke, J.F. The α and the ω of Congeneric Test Theory: An Extension of Reliability and Internal Consistency to Heterogeneous Tests. Appl. Psychol. Meas. 2005, 29, 65–81. [Google Scholar] [CrossRef]
  55. Naumann, A.; Hartig, J.; Hochweber, J. Absolute and Relative Measures of Instructional Sensitivity. J. Educ. Behav. Stat. 2017, 42, 678–705. [Google Scholar] [CrossRef]
  56. Sideridis, G.D.; Tsaousis, I.; Al-Harbi, K.A. Multi-Population Invariance with Dichotomous Measures. J. Psychoeduc. Assess. 2015, 33, 568–584. [Google Scholar] [CrossRef]
  57. Kuusela, H.; Paul, P. A Comparison of Concurrent and Retrospective Verbal Protocol Analysis. Am. J. Psychol. 2000, 113, 387–404. [Google Scholar] [CrossRef]
  58. Ryan, K.E.; Gannon-Slater, N.; Culbertson, M.J. Improving Survey Methods with Cognitive Interviews in Small and Medium-Scale Evaluations. Am. J. Eval. 2012, 33, 414–430. [Google Scholar] [CrossRef][Green Version]
  59. Anderson, D.; Lai, C.F.; Alonzo, J.; Tindal, G. Examining a Grade-Level Math CBM Designed for Persistently Low-Performing Students. Educ. Assess. 2011, 16, 15–34. [Google Scholar] [CrossRef]
  60. Bridgeman, B.; Trapani, C.; Curley, E. Impact of Fewer Questions per Section on SAT I Scores. J. Educ. Meas. 2004, 41, 291–310. [Google Scholar] [CrossRef]
  61. Wise, S.L.; Kingsbury, G.G. Modeling Student Test-Taking Motivation in the Context of an Adaptive Achievement Test. J. Educ. Meas. 2016, 53, 86–105. [Google Scholar] [CrossRef]
  62. Lamprianou, I.; Boyle, B. Accuracy of Measurement in the Context of Mathematics National Curriculum Tests in England for Ethnic Minority Pupils and Pupils Who Speak English as an Additional Language. J. Educ. Meas. 2004, 41, 239–259. [Google Scholar] [CrossRef]
  63. Zydney, J.; Hord, C.; Koenig, K. Helping Students with Learning Disabilities Through Video-Based, Universally Designed Assessment. eLearn 2020, 2020. [Google Scholar] [CrossRef]
  64. Johnstone, C.; Higgins, J.; Fedorchak, G. Assessment in an era of accessibility: Evaluating rules for scripting audio representation of test items. Br. J. Educ. Technol. 2018, 50, 806–818. [Google Scholar] [CrossRef]
  65. Muldner, K.; Burleson, W.; Van De Sande, B.; VanLehn, K. An analysis of students’ gaming behaviors in an intelligent tutoring system: Predictors and impacts. User Model. User Adapt. Interact. 2011, 21, 99–135. [Google Scholar] [CrossRef]
  66. Quigley, D.; Ostwald, J.L.; Sumner, T. Scientific modeling. In Proceedings of the 7th International Learning Analytics & Knowledge Conference, Vancouver, BA, Canada, 13–17 March 2017; Association for Computing Machinery: New York, NY, USA; pp. 329–338. [Google Scholar]
Figure 1. Extract of the learning environment where two female researchers present the research question.
Figure 1. Extract of the learning environment where two female researchers present the research question.
Education 11 00164 g001
Figure 2. Illustration of the Universal Design for Assessment (UDA) assessment with the implemented read-aloud function in the original language (German) [39].
Figure 2. Illustration of the Universal Design for Assessment (UDA) assessment with the implemented read-aloud function in the original language (German) [39].
Education 11 00164 g002
Figure 3. Differential item functioning (DIF) analysis of the justification scale from the UDA assessment concerning (1) gender, (2) reading literacy, (3) socioeconomic status, and (4) intelligence [39].
Figure 3. Differential item functioning (DIF) analysis of the justification scale from the UDA assessment concerning (1) gender, (2) reading literacy, (3) socioeconomic status, and (4) intelligence [39].
Education 11 00164 g003
Figure 4. DIF analysis of the justification scale from the original assessment concerning (1) gender, (2) reading literacy, (3) socioeconomic status, and (4) intelligence [39].
Figure 4. DIF analysis of the justification scale from the original assessment concerning (1) gender, (2) reading literacy, (3) socioeconomic status, and (4) intelligence [39].
Education 11 00164 g004
Figure 5. Simplified multi-group panel model of interindividual changes in the justification scale taking into account learner characteristics. Only significant paths are shown [39].
Figure 5. Simplified multi-group panel model of interindividual changes in the justification scale taking into account learner characteristics. Only significant paths are shown [39].
Education 11 00164 g005
Figure 6. Longitudinal plots to trajectories of all students to the justification scale from both assessments (UDA assessment n = 175; original assessment n = 165). The average is shown in black [39].
Figure 6. Longitudinal plots to trajectories of all students to the justification scale from both assessments (UDA assessment n = 175; original assessment n = 165). The average is shown in black [39].
Education 11 00164 g006
Table 1. The four dimensions of epistemic beliefs (EBs) in dependence on their expression [24].
Table 1. The four dimensions of epistemic beliefs (EBs) in dependence on their expression [24].
Dimensions of EBsNaïveSophisticated
Nature of knowledge
CertaintyScientific knowledge is either right or wrongScientific knowledge consists of the reflection of several perspectives
DevelopmentScientific knowledge is a static and unchangeable subjectScientific ideas and theories change in the light of new evidence
Nature of knowing
SourceKnowledge resides in external authorities such as teachers or scientistsKnowledge is created by the student
JustificationPhenomena are discovered through scientific investigation, such as experiment or observationKnowledge is created through arguments, thinking, multiple experimentation, and observation
Table 2. Universal Design for Learning (UDL) principles and guidelines [5].
Table 2. Universal Design for Learning (UDL) principles and guidelines [5].
Provide Multiple Means of EngagementProvide Multiple Means of RepresentationProvide Multiple Means of Action & Expression
1. Support possibilities for the perception of the learning content 4. Various ways to interact with the learning content7. Various offers to arouse the interest in learning
2. Support possibilities for the representations of linguistic and symbolic information of the learning content5. Various ways to express and communicate about the learning content8. Support options to maintain engaged learning
3. Support options for a better understanding of the learning content6. Support options for processing the learning content9. Support options for self-regulated learning
Table 3. Elements of a universally designed test [37].
Table 3. Elements of a universally designed test [37].
ElementDescription
1. Inclusive Assessment PopulationTests designed for state, district, or school accountability must include every student except those in the alternate assessment, and this is reflected in assessment design and field-testing procedures.
2. Precisely Defined ConstructsThe specific constructs tested must be clearly defined so that all construct irrelevant cognitive, sensory, emotional, and physical barriers can be removed.
3. Accessible, Non-Biased ItemsAccessibility is built into items at the beginning, and bias review procedures ensure that quality is retained in all items.
4. Amenable to AccommodationsThe test design facilitates the use of needed accommodations (e.g., all items can be Brailled).
5. Simple, Clear, and Intuitive Instructions and ProceduresAll instructions and procedures are simple, clear, and presented in understandable language.
6. Maximum Readability and ComprehensibilityA variety of readability and plain language guidelines are followed (e.g., sentence length and number of difficult words are kept to a minimum) to produce readable and comprehensible text.
7. Maximum LegibilityCharacteristics that ensure easy decipherability are applied to text, tables, figures, and illustrations, and to response formats.
Table 4. Operationalization of the UDL learning environment.
Table 4. Operationalization of the UDL learning environment.
OperationalizationUDL-Guideline
MS Sans Serif 181.
Line spacing 2.01.
Easy language1./2.
Pictorial support to distinguish text types (learning objectives, tasks, learning information)2.
Selection of the content representation form (pop-up text, comic, video)3./7.
Read aloud function3.
Page organization8./9.
Working with a checklist6.
Self-assessment on the learning content5./9.
Working on real objects4.
iPad-based4.
Group work/peer tutoring5.
Table 5. The 2×2-Between-Subject-Design.
Table 5. The 2×2-Between-Subject-Design.
AssessmentLearning Environment
UDL learning environmentMR learning environment
UDA assessmentGroup 1Group 2
Standard assessmentGroup 3Group 4
Table 6. The wording of the justification scale items in the UDA assessment.
Table 6. The wording of the justification scale items in the UDA assessment.
ItemsWording
Item 1Scientists carry out experiments several times in order to secure the result.
Item 2When natural scientists conduct experiments, natural scientists determine important things beforehand.
Item 3Scientists need clear ideas before researchers start experimenting.
Item 4Scientists get ideas for science experiments by being curious and thinking about how something works.
Item 5An experiment is a good way to find out if something is true.
Item 6Good theories are based on results from many different experiments.
Item 7Natural scientists can test their ideas in various ways.
Table 7. Collected learner characteristics.
Table 7. Collected learner characteristics.
Test TypeConstruct
Paper-pencil testReading: Salzburger-Lesescreening 2–9 [48]
Cognitive skills: KFT 4-12+R-N2 [49]
iPadSocioeconomic status [50]
Cognitive activation [51]
Perception of learning success [52]
Gender
Age
Diagnosed special needs
Table 8. Reformulated justification short scale with the standard factor loadings, mean differences, and associated Bonferroni-corrected significances.
Table 8. Reformulated justification short scale with the standard factor loadings, mean differences, and associated Bonferroni-corrected significances.
Original AssessmentUDA Assessment
Standardized Factor LoadingsMean Values Standardized Factor LoadingsMean Values
MP 1MP 2MP 2-MP 1pMP 1MP 2MP 2-MP 1p
Item 20.250.38−0.240.010.720.700.120.26
Item 30.210.460.001.000.520.600.170.08
Item 40.440.710.180.060.630.690.440.00
Item 60.500.730.010.880.660.68−0.090.36
Explanatory note: MP: measurement point.
Table 9. Internal consistencies of the short scales: McDonalds-w.
Table 9. Internal consistencies of the short scales: McDonalds-w.
EBs-ScaleMP 1MP 2MP 1 and MP 2
UDA Assessment
Source0.860.890.9
Certainty0.80.810.85
Development0.780.850.87
Justification0.740.780.8
Original Assessment
Source0.830.90.89
Certainty0.740.750.81
Development0.750.790.82
Justification0.410.660.62
Explanatory note: MP: measurement point.
Table 10. Measurement invariance models for EB short scale justification.
Table 10. Measurement invariance models for EB short scale justification.
Fit Values
StageChi-SquaredFpRMSEA CFITLISRMRAccepted?
Configural41.31300.0820.0530.9840.9710.034Yes
Metric43.47390.2870.030.9910.9870.054Yes
Scalar69.9548<0.050.0580.9690.9640.057Yes
Strict70.96590.1370.0390.9760.9770.073Yes
Explanatory note: dF: degrees of freedom; CFI: Comparative-Fit-Index; RMSEA: Root-Mean-Square-Error of Approximation; TLI: Trucker-Lewis-Index; SRMR. Standardized Root Mean Square Residual.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Back to TopTop