1. Introduction
Many states in the U.S. have adopted the Next Generation Science Standards (NGSS), which call for a paradigm shift in science teaching, learning, and assessment [
1]. This shift advocates for an approach that encourages discourse production among students and teachers using multiple repertoires, including different languages and multimodality, to enhance the development of disciplinary practices in STEM as well as metalinguistic awareness [
2]. This shift involves viewing students as active participants who “do science” rather than passively “learn science”. It involves using language to engage in scientific inquiry and communicate ideas through various multimodal resources, such as gestures, visuals, computer models, and home languages [
3]. This highlights the essential connection between language development and science learning, as well as the fact that students are no longer limited to accessing “scientific content” or achieving literacy only by reading about science; instead, they attain literacy through engaging in active inquiry, participating in hands-on activities, and interacting by using their complex linguistic repertories [
4].
This new approach presents unique challenges for researchers, educators, curriculum designers, and administrators to rethink pedagogical practices to design and interpret written assessment results that align with this progressive perspective. Moreover, equity issues for all students, including Multilingual Learners (MLs) in STEM, have become central to ongoing discussions [
5]. Shifts in applied and educational linguistics, such as the concept of translanguaging, require recognizing MLs’ dynamic multilingual repertoires as central to justice-oriented and decolonizing approaches in STEM education [
6,
7,
8].
However, conventional analysis of written student assessments has predominantly focused on numeric hand scores through a rubric and summative feedback to provide information about student learning and measure teacher accountability [
9]. This approach fails to consider the sociocultural context of students’ lives, which significantly affects their thinking process and the nature of their responses [
10]. Additionally, by prioritizing vocabulary and grammar as prerequisites for scientific learning, traditional education methods often result in MLs being removed from regular science classes to focus on English language development in order to become proficient in English. This approach deprives many MLs of valuable opportunities to develop both language skills and scientific understanding, often hindering their academic trajectories. Without recognizing that language is at the core of the learning process and that students learn through language, and not only language itself, MLs do not have the opportunity to fully engage with content, hindering their academic development [
11].
In this context, the challenge of constructing equitable written science assessments and developing models to interpret them is both complex and urgent, given the expanding population of MLs in the U.S. [
12]. Indeed, research has documented that it is not enough to read or write accurately across different content areas without understanding the unique patterns each discipline uses to communicate meaning [
13,
14]. These patterns must be taught explicitly, as knowledge of appropriate linguistic choices is not intuitive. Schools, and teachers, must, therefore, help students understand the power of different linguistic choices in conveying various meanings in different contexts [
15].
For MLs, the learning challenge is multifaceted. They must learn English while simultaneously mastering academic and technical concepts specific to each content area. In science, the additional complexity lies in mastering the language of science (LoS), which involves understanding scientific terminology, comprehending complex texts, interpreting data, and engaging in scientific discourse using precise and discipline-specific language. Therefore, the dual and long-standing challenge for MLs in U.S. schools is that of developing disciplinary content in a language they have not yet fully mastered and navigating an educational system that lacks effective and equitable assessment measures in science [
3,
9].
Current research highlights the need to explore innovative and reliable methods to measure and interpret MLs’ written performance in science [
9]. While the need for differentiated instruction for culturally and linguistically diverse students is well-established [
16,
17], the conversation has not equally addressed the differentiation and design of formal written assessments.
In this paper, we introduce the Multidimensional Assessment Performance Analysis (MAPA) model, a working multimodal framework designed to interpret written science assessments in a comprehensive and less restrictive manner. This framework aims to enhance coherence between students’ understanding of the language of science, classroom practices, and written assessments. More specifically, and to this end, MAPA expands the scoring of constructed-response tests by analyzing students’ language through a combination of Systemic Functional Linguistics (SFL) tools to capture student performance in terms of language use, and topic modeling to provide additional insight into students’ thinking and reasoning by detecting the latent structure underlying the students’ writing [
18].
Investigating models of written assessment in science tailored to MLs in K-12 is crucial for promoting equity and inclusion. It helps to accurately measure students’ knowledge by distinguishing scientific understanding from language proficiency. Additionally, assessments identify specific areas where students need support, leading to targeted instructional strategies. These assessments also enhance engagement by being more relevant to students’ linguistic and cultural backgrounds. They raise teacher awareness of the importance of equitable practices and contribute to better long-term educational outcomes for MLs. Moreover, the lack of basic scientific literacy among students who leave school has become a persistent national issue. It is evident that the language of science contributes to this problem and, thus, should also play a key role in any proposed solutions [
2].
The following sections are structured as follows: an introduction to the challenges of assessing MLs learning in science; an exploration of the complexity of the Language of Science (LoS) and its implications for learning and assessment; a description of the Multidimensional Assessment Performance Analysis (MAPA) framework, with an overview of the theoretical and empirical foundations supporting MAPA; and conclusions and implications, with recommendations.
2. Understanding the Language of Science and Its Connections to Written Assessments
To address the issues and complexities inherent in designing and interpreting equitable written science assessments, it is necessary to understand the role played by the language of science (LoS) in learning. The LoS is particularly challenging for students to learn and master, a difficulty that is especially pronounced for MLs, not because of their abilities, but because of its complexity. The LoS consists of specialized vocabulary, complex syntax, and requires data interpretation skills and evidence-based argumentation elements which make it challenging for students to develop without proper scaffolding. While the LoS is vital for the scientific community, as it codifies and communicates scientific findings and advancements, its complexity, particularly in written form, often discourages student engagement without adequate language support [
2].
Fundamentally, the LoS should facilitate the abstract communication of real-world phenomena, a skill essential for students as they develop more complex theories and understandings throughout middle and high school. However, the LoS differs significantly from everyday language and from the language typically used by students. As noted, “most of us would say that we recognize scientific language when we hear it, and this has implications for students’ use of language in the science classroom” [
2] (p. 24). Indeed, recognizing scientific language impacts how students use language in the classroom. For example, in science, students might use words like “hypothesis” instead of “guess”, which would demonstrate an understanding of experimentation and prediction. In the chemistry class, students might describe a chemical reaction using terms such as “reactants” and “products” rather than saying that substances “mix” and “change”. These linguistic distinctions help students communicate more precisely. Two main characteristics distinguish the LoS from everyday language: technicality, which involves the use of specific terminology, and rationality, which pertains to the structural composition of scientific language [
19]. To engage students deeply with the LoS, both characteristics must be explicitly taught in schools.
Introducing the LoS early in education, with an understanding of students’ diverse linguistic backgrounds and cultures, is vital as “language both shapes and is shaped by our experience with the world around us” [
2] (p. 27). Consequently, a student’s background influences their understanding of both the world and science. It is important to invite students to contribute their own ideas and experiences with scientific concepts, thus enhancing their understanding. In classrooms across the U.S., students are currently more likely to read about and listen to the LoS rather than to actively engage with it through speaking or writing. This passive engagement often leads to the memorization of information rather than the exploration and inquiry needed for genuine understanding [
20] as advocated by the Next Generation Science Standards (NGSS). More specifically, the new standards call for transitioning from “learning context and inquiry in isolation to building knowledge in use” [
21] (p. 158). However, the adoption of the NGSS signifies a meaningful transformation in educational practices, moving from traditional knowledge dissemination to innovative knowledge creation methods, which highlight the intricate nature of teaching expertise, emphasizing the necessity for a deep understanding of environments that foster generative, rather than reproductive, learning.
A fundamental ability to interpret and produce the language of science is a prerequisite for developing sustained scientific discourse, which is necessary for engaging in the science and engineering practices endorsed by the NGSS. Despite this, students spend more class time performing science tasks that consist of learning content through reading and listening rather than producing any form of science-related discourse, such as talking or writing. This gap is particularly relevant when considering the ongoing issues and debates surrounding science written assessments. In other words, classroom instruction and assessment practices are disconnected in terms of allowing students to engage in active scientific discourse versus merely receiving information passively, leading to a disparity between students’ learning experiences and their performance on assessments that demand the application and articulation of scientific concepts. If educators do not establish meaningful links between the formative and summative aspects of classroom instruction and assessments, issues of external student assessments and legitimacy will continue to be present [
22].
Additionally, students learning science must navigate multiple discursive languages [
23], and this specialized language of science education poses a significant challenge for most students [
24,
25] due to the semiotic resources required for clear, concise, and accurate communication. When focusing on written text, factors such as thematic patterns [
26], high lexical density, and socio-semiotic grammatical shifts, including nominalization [
19], contribute to these challenges. Research indicates that the written language in school science environments is characterized by high lexical density, abstraction, and technicality [
27].
3. Nominalization
The language of science transforms students’ concrete life experiences into abstract entities through nominalization, a process described by Halliday [
28]. This grammatical function, which employs nominalizations and passive structures, is essential for organizing and describing subject content. Nominalization can be challenging for students because it transforms concrete verbs and adjectives into abstract nouns, making it more complex for both understanding and usage. This process makes language denser and more packed with information, requiring students to interpret and organize ideas differently.
Halliday and Matthiessen [
29] have proposed that this key component of written language complexity, i.e., nominalizing, likely originated within the scientific domain. It involves converting verbs, adjectives, or circumstances into nouns, making language more abstract. It is key to note that research shows that nominalization varies across different genres and disciplines, such as scientific versus other academic discourses [
30], and across languages, e.g., in Spanish academic writing [
31]. These variations are influenced by disciplinary conventions and the expectations of individual academic communities [
32].
4. Use of Clarifying Words
Another linguistic mechanism involves the use of interconnecting, descriptive, and clarifying words and phrases to express semantic relationships between similar or dissimilar phenomena [
26]. This practice enables scientists to construct categories and classes, forming taxonomic relationships [
19]. According to Lemke [
26], developing an understanding of these semantic relationships is central to all meaning-making processes in science instruction. For example, in biology, organisms are organized taxonomically into hierarchical groups; in chemistry, students are required to recognize periodic trends in the periodic table; ecosystems are explained through interconnected terms like “producers” and “consumers”. These semantic relationships help convey the logical structure of scientific knowledge as well as the relationships among scientific ideas. Scientific concepts derive meaning by being organized taxonomically into larger thematic patterns [
26]. In this way, the specialized and logical structure of scientific knowledge and language is conveyed through interconnecting, descriptive, and clarifying terms for the learner.
Understanding these linguistic complexities is essential for designing and interpreting equitable written science assessments, as it is explicitly teaching the elements of the LoS, while recognizing patterns of variation that come from the MLs’ home languages, as well as variation across academic genres learned at school.
5. Background and Literature Review
This section aims to provide a comprehensive background and literature review on the current state of written assessments in science education, with a particular focus on MLs. To achieve this, we have reviewed literature that addresses various methodological frameworks and theoretical paradigms. We have chosen Evidence-Based Practice (EBP) as our overarching methodological framework because it aims not only to gather theoretical insights but also to develop conceptual and practical knowledge that can inform future research and practice. EBP is particularly suitable for this review as it emphasizes “using scientific evidence to inform instructional decision making” [
33] (p. 143).
5.1. The Role of Written Science Assessments for Multilingual Learners
The design, implementation, and interpretation of written science assessments need to address the numerous issues faced by students and educators, particularly those related to the increasing population of MLs. A significant challenge is the gap between learning science content and being proficient in English. Utilizing various assessment methods can help bridge this gap, thereby creating a more conducive learning environment for MLs.
To understand the field of written science assessments, it is essential to recognize the evolving nature of assessment and science learning paradigms alongside international migratory patterns. These patterns have increased the necessity for schools to support an expanding multilingual student population [
34]. This shift has prompted the development of guidelines designed to enhance awareness and address this issue. For instance, the U.S. National Research Council has proposed new assessment guidelines that advocate for diverse assessment options, such as portfolios, performance tasks, and mixed-item formats [
35]. These progressive steps aim to validate and integrate students’ linguistic repertoires in various classrooms, thereby promoting equity for all students [
36,
37]. Importantly, these guidelines emphasize considering the sociocultural context of learning in both the design and application of assessments.
5.2. Historical Perspectives and Scaffolding
If a test did not follow the recommendations for fairness in test design, several adverse outcomes could arise, including those concerning interpretation. For example, without appropriate scaffolds tailored to MLs, they may not fully understand and respond to assessment tasks, leading to inaccurate representations of their knowledge and abilities. This could result not only in unfair disadvantages but also in misinterpretations of their academic progress and learning.
Regarding the use of Topic Models (e.g., [
38]) and Systemic Functional Linguistics (SFL), these tools could be valuable for analyzing test data and attempting to enhance fairness post hoc. However, their effectiveness would be limited if the original design lacked fairness considerations. In other words, while TM and SFL can offer insights into patterns and biases within test responses, the fundamental issues stemming from an unfair test design might still impede their overall utility in rectifying disparities.
Historically, educators have recognized the importance of scaffolding to support multilingual learners (MLs) in successfully completing assessment tasks. As outlined by Gottlieb [
39], there are four primary categories of scaffolds that have been effectively employed: linguistic, graphic, sensory, and interactive. Linguistic scaffolds are crucial in breaking down language barriers that MLs often face. These scaffolds include strategies such as defining key terms to make sure that students understand crucial vocabulary before tackling complex tasks. This might also involve providing sentence starters or language frames, which can help students structure their responses more effectively and confidently.
Graphic scaffolds provide visual aids that help MLs organize and connect information. Tools like graphic organizers, charts, and diagrams serve as visual representations of complex concepts, enabling students to see relationships and hierarchies within the content. These tools are particularly beneficial for visual learners as they can convert abstract ideas into tangible visual cues, aiding in comprehension and retention of information.
Sensory scaffolds incorporate models and multimedia resources to engage multiple senses and present information in diverse ways. Tangible models, videos, and audio recordings can help educators create a more dynamic learning environment that caters to different learning preferences and helps MLs grasp difficult concepts. These resources can make learning more interactive and memorable, providing another layer of support for understanding and applying knowledge.
Finally, interactive scaffolds facilitate collaboration and peer communication, promoting a social aspect of learning which not only helps develop language skills but also fosters a sense of community and support among learners. Together, these scaffold types create a robust framework that helps bridge comprehension gaps, allowing MLs to showcase their true academic abilities while developing both content knowledge and language proficiency.
5.3. Equity and Community in Science Education
Settlage and Williams [
40] advocate for equity in science education through community-building, empowering educators, and fostering teacher growth. They argue that science education requires substantial reforms at every level—from classroom teaching to theoretical frameworks. Schools need to recognize and address the unique challenges faced by specific student groups and provide appropriate scaffolds, rather than treating all students uniformly.
To promote equity in science education, Settlage and Williams propose a systemic approach encompassing six categories: Instructing, Theorizing, Mentoring, Partnering, Reviewing, and Advising. They gathered feedback from attendees on their areas of interest and expertise and focused on areas where attendees could contribute and grow. Emphasizing community, they also caution that only those committed to genuine participation should engage in discussions.
5.4. Semantic Relations in Science Language
Halliday and Matthiessen [
29] differentiate between syntagmatic and paradigmatic semantic relations. Syntagmatic relations involve connections between words within a sentence, typically from different word classes, as in “snow-skiing” or “photosynthesis occurs”, where “photosynthesis” is a noun and “occurs” is a verb. In contrast, paradigmatic relations link words within the same class, establishing taxonomic relationships, such as “mammal”, “reptile”, and “bird”, which are all nouns representing different animal classes in biology. Research indicates that adults tend to use paradigmatic relations more than children do, who are more likely to use syntagmatic relations, suggesting that understanding semantic relations is crucial for developing scientific proficiency [
39,
41].
Verhallen and Schoonen [
41] found that MLs who use their second language only at school and their first language in everyday life tend to use fewer paradigmatic associations compared to peers. A Swedish study corroborates these findings, showing that primary school students receiving instruction in both Swedish and Arabic exhibited a higher use of paradigmatic relations.
In conclusion, current research highlights the increasing need to understand the complexities of LoS and its impact on the construction and interpretation of data from science written assessments.
6. Developing the Multidimensional Assessment Performance Analysis (MAPA) Framework
This section explores the conceptual and practical principles of the Multidimensional Assessment Performance Analysis (MAPA) framework, designed to interpret written science assessments, particularly for multilingual learners (MLs). This working framework aims to enhance coherence between our understanding of the Language of Science (LoS), classroom practices, and written assessments.
Acknowledging the “dynamic nature of the virtual/actual interference of context and text”—in other words, the interplay of various dynamic variables in the contexts of their production [
42] (p. 19), MAPA has been refined since 2015 by a team of researchers from an NSF-funded project to improve teachers’ understanding of MLs’ written science inquiry practices [
43]. The research team created bilingual (Spanish and English) constructed-response assessments of science and language practices for ML middle and high school students, who participated in a longitudinal study as part of the project.
MAPA is premised on the idea that teachers require a reliable source of information to adequately reflect on and comprehend student writing in science. While a protocol was developed to perform Rubric-Based Assessment (RBA), the need to answer key questions beyond numeric scores led to the incorporation of SFL and Topic model analyses. Each analysis provides unique information about students’ engagement with scientific inquiry practices and their ability to communicate them in writing.
Evaluating assessment outcomes to gauge growth in inquiry, content, and language development—as outlined in our initial RBA protocol—revealed the potential to explore shifts in conceptualizing science and language learning further. Specifically, the framework supports a transition from a structural approach, focused on technical vocabulary and grammar, to a functional approach centered on meaning-making processes.
MAPA emphasizes that RBA scores alone do not present the entire picture of student learning. Assessment analysis within MAPA is informed by a Systemic Functional Linguistics (SFL) approach [
15,
37,
44,
45,
46] and Topic Models Analysis such as Latent Dirichlet Allocation (LDA; [
38]), as previously explored in related research [
8]. This multilayered analysis of student responses aims to explore multiple dimensions of MLs’ performance and provide insights into the variability thereof.
SFL conceptualizes language as a “set of options available for construing different kinds of meanings” [
15] (p. 7). Hence, our analysis focuses on identifying specific linguistic features essential for scientific writing in school contexts, examining technical vocabulary usage, lexical density, and nominalization or grammatical drift. Specifically, by including SFL scores, it is possible to provide an average number and standard deviation, considering what students have produced rather than what is missing.
Topic Models are probabilistic models that aim to uncover the latent thematic structure hidden in the collection of students’ responses to constructed-response items [
38]. In other words, these methods use statistical algorithms to automatically identify, summarize, and allocate students into latent topics representing their reasoning and thinking based on co-occurrences of words in students’ responses to the test questions [
47]. These results provide additional information about their science ability while exploring how students’ backgrounds influence their understanding of both the world and science.
Preliminary findings [
9,
48] suggest that comprehensive insights from MLs’ responses can reveal emergent understandings with significant instructional implications, demonstrating the value of utilizing complementary methods to interpret student performance on assessments. This comprehensive analysis guarantees that gathered data inform curriculum and instructional practices effectively, as science teachers cannot make optimal instructional decisions with incomplete assessment information.
7. Questions Surrounding the Practical Application of MAPA
MAPA has the potential to serve as a model for both developing and analyzing written test items. Teachers and researchers can create assessment items by following guidelines that incorporate relevant linguistic and topical dimensions for MLs’ learning. The number of items should be sufficient to cover the scope of the assessment and provide a comprehensive evaluation of student abilities. Warranting the psychometric properties entails rigorous testing and validation of the items against the MAPA framework, supported by continuous data analysis. SFL and topic models can be applied both to individual items and across multiple items, revealing patterns and insights at various levels of the assessment. MAPA can then offer an innovative approach to understanding and supporting MLs in science education, providing educators with valuable tools to enhance teaching and learning outcomes.
8. Conclusions and Implications
This article aimed to explore how written science assessments for MLs can be made to serve as vital tools for teachers striving to adopt culturally and linguistically sustaining practices. We addressed this objective by providing a comprehensive overview of the complexities inherent in the written language of science and analyzing current literature in the field. In addition, the Multidimensional Assessment Performance Analysis (MAPA) was introduced as a model for rethinking the assessment of writing in multilingual contexts. It is important to note that the topic of written assessment in science is relevant to all students, including native English speakers, as testing continues to be a critical element in evaluating student learning and teacher accountability nationwide [
49].
In discussing the MAPA framework, we emphasized the importance of considering written science assessments in terms of both their design and interpretation. Aligning these two aspects can help teachers receive reliable information from assessments. Thus, written science assessments should be conceived as expansive and inclusive tools for evaluating MLs’ knowledge. Considerations such as the role of students’ home languages, translanguaging processes, and the application of metarepresentational competence should be central to the assessment research agenda.
In advancing the MAPA framework, we aim to reflect critically on the nature of traditional testing tools and their potential negative impacts on all students, especially MLs. [
50]. Indeed, traditional assessments often fail to capture the full range of MLs’ capabilities due to their emphasis on standardized language norms that do not account for the diverse linguistic repertoires of these students.
The foundational principles of MAPA draw from theoretical and pedagogical orientations that validate the “historical, linguistic, semiotic, and cultural” dimensions of working with MLs [
26] (p. 297). These students bring rich linguistic and cultural backgrounds to the classroom, and assessments must reflect and honor this diversity. Reconceiving written science assessments is an urgent step toward recognizing the dynamic and growing K-12 student population. More importantly, it is crucial for the development and sustainability of justice-oriented science education.
In summary, MAPA as a working framework, can provide a comprehensive approach to evaluating written science assessments, addressing both the design and interpretation to better serve MLs. At its core, MAPA aims to create equitable and effective assessment strategies that recognize and support the unique needs and strengths of all students with the aim of advancing science education in a manner that is inclusive, meaningful, and aligned with principles of social justice.
9. Recommendations
This article focused on providing theoretical evidence to expand the rubric-based scoring interpretations by analyzing students’ language using SFL and Topic Models, offering a deeper insight into students’ responses with the potential to enhance equity and fairness post hoc.
Initially, MAPA used the simplest model (i.e., LDA) to analyze written assessments, which does not consider external variables to explain the clustering of words. Because of this, LDA assumes that answers from native and MLs are similar. However, this is not always true, as MLs might lack coherence in their answers or use simpler words to explain science compared to native speakers. This could lead to MLs being grouped into one topic instead of reflecting the general underlying structure for all students.
To account for the linguistic and cultural diversity of MLs, we recommend using the Structural Topic Model (STM; [
51]), which includes covariates to explain the topics. For example, STM detects a topic structure that shares the semantic meaning across different groups (e.g., monolingual, bilingual, multilingual) while differentiating which topic(s) is (are) more discussed by each group and which words are predominantly used by each group in each topic. This approach might increase fairness while considering the differences among groups. Additionally, if students’ answers are in multiple languages, using The Polylingual Topic Model [
52] could be a better approach, as it estimates a topic structure that shares the semantic meaning across languages while keeping the language-specific word distributions.
Although this paper focuses on the potential benefits of using Topic Models and SFL to analyze test data in order to enhance equity and fairness post hoc, their effectiveness is linked to designing test items that consider the sociocultural context of learning. In other words, while Topic Models and SFL can offer insights into patterns and biases within test responses, the fundamental issues stemming from an unfair test design might still impede their overall utility in rectifying disparities. Our next steps are to develop guidelines for test design and to create a multidimensional test analysis that combines rubric-based test scores along with SFL and Topic Model results to evaluate students’ learning comprehensively, tackling disparities that traditional test analysis might not account for.