Adequate measurement and feedback is needed to assess students’ level (and development) of SIC and the effectiveness of learning materials. Several general scientific inquiry tests have been developed and used for formal and informal assessment (e.g., [
9,
10,
11]; see
Table S2), but many of these instruments were only developed for students up to 10th grade. Generally, they confound cognitive and manual skills and mostly these instruments, are not subject specific. However, subject-specificity is important since learners are dealing with a subject-specific knowledge base on the one hand (e.g., ecology, enzymology, ethology, neurophysiology), which pose specific demands e.g., on cognitive load or on conceptual knowledge [
12,
13] and with subject-specific objects (e.g., animals in biology) that need special handling [
14] on the other hand. An additional feature of previously developed science inquiry instruments is that only a few of these tests (e.g., [
9,
15]) make use of open test formats. Open test formats represent a compromise between multiple-choice test formats (which are very practical and economic but unspecific), and performance assessments (which can capture the depth of inquiry [
16], but are often very time-consuming, costly, and seldom practical for normal school use [
17].
This paper aims to describe steps towards development, psychometric evaluation and use of an open-ended assessment format based on Rasch measurement analyses. With this way to assessment of SIC we want to demonstrate the potentials of Rasch measurement for the integration of theory and practice in teaching and learning of SIC and add instrumentation suggestions to the three described shortfalls in existing science inquiry instrumentation the need for an instrument that is suitable for upper secondary students, that is subject specific and uses open test formats.
1.1. Scientific Inquiry Competence
To develop SIC assessment, it is important to detail the way the construct is defined, namely to detail what one must be able to do to be “competent” in scientific inquiry. Several authors [
1,
18,
19] have pointed out that scientific inquiry is a problem-solving process, concerned with problems and phenomena in the natural world. Hence, in this manner SIC is defined as the ability to solve problems about the living natural world by using scientific methods. Regarding the broad topic of scientific inquiry, which encompasses many different scientific methods (e.g., observation, comparison, or experimentation; [
20]). By using the term “competence” we seek to emphasize the cognitive aspects of the ability to use problem-solving procedures rather than manual skills [
8,
21]. The focus lies upon experimentation of causal relations, because the experiment is considered the central method of science [
22]. Hence for this instrument SIC is defined as the ability to understand, conduct, and critically evaluate scientific experiments on causal relationships.
To foster and assess SIC it is necessary to define specific components of what it means to be competent. Certainly the scientific problem-solving process (the process of science inquiry as reflected by experiments on causal relations) is not a linear process with a defined order of steps [
23,
24,
25], but if one attempts to measure competence (with respect to science inquiry through experimentation) a specific set of experimental procedures to be mastered have to be identified [
26].
Table A1 provides an overview of how different authors (of tests, models, and learning materials) have conceptualized SIC regarding the realm of experimentation and requirements. The literature review suggests three key components (sub-competences) of experimentation regarding the cognitive aspects of inquiry to be met to competently master an experimentation task (
Figure 1).
Students should be able to formulate scientific questions/generate testable hypotheses. Inquiry starts with a phenomenon and a resulting problem. Questions arise based on this problem and students should have the ability to formulate the problem [
27] and/or generate adequate research questions (e.g., [
20,
28,
29,
30]). Questions addressing experimentation on causal relations must consider the relationship between dependent variables and independent variables [
31,
32,
33]. To answer a scientific question, one must generate testable hypotheses (e.g., [
10,
11,
29]). To generate testable hypotheses, one must identify or name the independent variables [
34] and the dependent variables (e.g., [
15,
20]). These variables should be formulated as a prediction of the outcome of an experiment, e.g., the question is authored in an “if-then” mode (e.g., [
9,
31,
35]). These hypotheses or predictions should be based on prior knowledge, analogies, theories, or principles and therefore justified (e.g., [
18,
20,
36]). In addition, further hypotheses are needed including different independent variables or different predictions, each of which can then be ruled out (e.g., [
32,
33]).
Students should be able to design an experiment. To test each hypothesis, one must design and conduct an experiment. Therefore, students should be able to vary the independent variables, as well as be able to operationalize and measure the dependent variables (e.g., [
15,
20,
29,
30,
37]) and control confounding variables (e.g., [
33,
38]). Furthermore, students need to be able to define test times (i.e., start, interval and duration of measurement; e.g., [
20,
30,
32]) and students need to be able to use experimental replications (e.g., [
20,
34,
39]).
Students should be able to analyze data. Regarding data analysis, students should be able to objectively describe data (e.g., [
15,
32,
36]) before interpreting it with respect to hypotheses (e.g., [
9,
10,
11,
30,
40]). Additionally, the certainty and the limitations of interpretations must be discussed (e.g., [
20,
33]). For example, the validity of results [
27] must be discussed and considered [
41]. Furthermore, the entire methodological design of the experiment must be evaluated critically to detect any flaws [
29,
42] and to provide guidance as to ways in which the experiment could be improved [
34]. Data analysis often results in the generation of new questions and/or in the modification of hypotheses which then must be tested again. Hence, it is important for students to have the ability to provide an overview of what steps might follow a specific experiment. Students should also understand that research does not end with data interpretation [
20,
28,
38].
Considering the identification of student abilities, a wide range of research has been conducted to explore the level of student competency. Author et al. [
32], for instance, have found that most students (up to 10th grade) fail to formulate questions for a quantitative relation of variables. Hofstein et al. [
28], found that particularly inexperienced students exhibit difficulties generating scientific questions. Another noteworthy shortfall in generating questions is that students often fail to focus on one single factor which might lead to confounded experiments [
43]. Regarding the skill of “generating hypotheses” de Jong & van Joolingen [
44] have pointed out that students simply “may not know, what a hypothesis should look like” [
44] (p. 183) i.e., that it should be the description of the relationship of variables. Another common student misconception is that they often view hypotheses as statements that require confirmation rather than considering hypotheses as predictions to be tested. Students also often fail to formulate alternative hypotheses [
18,
45]. One result of this inability is that students often retain their original hypothesis even in face of conflicting data [
46].
Studies have also suggested student difficulties with the design and conducting of experiments. Schauble and colleagues [
47] found that students often consider only one variable and seem not to understand the relationship of the two variables explored in the experiment. Students also exhibit difficulties in understanding the operationalization of variables [
47]. Another common problem is that students often design or conduct experiments with only one variant of the independent variable, i.e., studies are designed without a control [
48]. A related problem is that students alter more than one independent variable at a time. As a result, students fail to plan unconfounded experiments [
49]. In their research Duggan, Johnson, and Gott [
50] commented on student difficulty regarding operationalizing the independent variable as continuous and controlling confounding variables. An additional shortfall in student understanding also concerns a lack of understanding the importance of repeated measurements [
51,
52].
Regarding the sub-competence of “data analysis”, Germann and Abram [
41] describe that students often fail to take the hypothesis into consideration while interpreting data, but students still draw conclusions and students still provide reasons for their conclusions. Lubben & Millar [
52] and Roberts & Gott [
53] found that students often do not identify anomalous results and fail to critically evaluate anomalous results while drawing conclusions. Roberts & Gott [
53] determined that students have problems considering sample size, representativeness, and design validity when drawing conclusions.
The problems and misconceptions detailed above occur among many students of a wide age range. Evidently some of these problems occur less often among older students; however, some problems can be observed among students of various ages. Tamir et al. [
15] for example investigated older students. These researchers found that the majority of 12th graders can in some way solve new problems in the laboratory. Some student difficulties still involve accurate formulation of hypotheses and identification and definition of dependent and independent variables. Concerning experimental design, students exhibit weaknesses, such as not being able to adequately control variables. Additional findings include students exhibiting difficulties explaining research findings and often offer “intuitive explanations” or “explanations based on teleology and/or anthropomorphism” while analyzing data ([
15], p. 49; [
39]).
1.2. Assessments of Scientific Inquiry Competences
Several instruments were developed to attempt assessing SIC.
Table A2 presents a summary of previous SIC instrumentation which was developed in the last 30+ years. Such instruments were developed for a range of grade levels using a variety of formats.
Table A2 reveals that most of these instruments are designed for and used through grade 10. One result of this past age range of instrument targeting is that at this point only little is known about competences of older students. Nevertheless, knowledge about competences in this age-group remains important since this is the transition from school to possibly working in the scientific field.
Another factor to consider when reviewing past instruments is the test response format. Many tests make use of multiple-choice formats (e.g., [
10,
11,
30,
38,
40]), which are economic regarding administration and scoring time, but do not allow for evaluation of students’ ability to construct and formulate an individual response [
15,
16,
40]. Some past instruments have made use of hands-on activities (e.g., [
9,
15,
31,
36,
37,
41]). One reason for the common use of a more closed format may be that such authentic tasks are often complicated and require group-work, which complicates individual assessment [
54].
Several instruments presented in
Table A2, generally have more than one focus, i.e., combine testing practical skills and cognitive problem-solving abilities (e.g., [
41]). An additional characteristic of many of the previous instruments is that often scenarios lacking in complexity are applied when using practical tasks (e.g., [
41] ask students to mix hot and cold water). Such experiments are very time-consuming for the teacher, often require small groups of students to work together and often require physical materials for assessments—hence the lack of complex experiments in assessments. Although rare, Tamir et al. [
15] for example use complex experiments. A final point, none of the existing instruments use the organizational framework of “aspects” (as provided in
Figure 1) to guide what the instruments measure. Our research used these aspects to guide our instrument design, data analysis and interpretation of results.
To fill a research gap and an instrumentation gap, the purpose of this study is (1) to develop an assessment instrument to evaluate students’ SIC in biology suitable for Rasch Analyses and (2) to evaluate the instruments’ functioning to discuss instruments potential for research and teaching. Development of such an instrument will provide insight into the abilities of upper secondary students’ SIC using demanding and relevant contexts.