Designing Data Science Learning in Initial Teacher Education: The EDUCATE Conceptual Framework

Leavy, Aisling; Kazak, Sibel; Podworny, Susanne; Frischemeier, Daniel

doi:10.3390/educsci16020307

Open AccessReview

Designing Data Science Learning in Initial Teacher Education: The EDUCATE Conceptual Framework

¹

Department of STEM Education, Mary Immaculate College, V94 VN26 Limerick, Ireland

²

Department of Mathematics and Science Education, Faculty of Education, Middle East Technical University, 06800 Ankara, Türkiye

³

Department of Mathematics, Paderborn University, 33098 Paderborn, Germany

⁴

Institute of Fundamental and Inclusive Mathematics Education, Faculty of Mathematics and Computer Science, University of Münster, 48419 Münster, Germany

^*

Author to whom correspondence should be addressed.

Educ. Sci. 2026, 16(2), 307; https://doi.org/10.3390/educsci16020307

Submission received: 5 January 2026 / Revised: 31 January 2026 / Accepted: 9 February 2026 / Published: 13 February 2026

Download

Browse Figures

Review Reports Versions Notes

Abstract

Data science has become central to contemporary social, civic, and professional life, yet its integration into initial teacher education remains fragmented and undertheorised. This paper addresses the need to support teacher educators in designing learning experiences that develop pre-service teachers, who are non-data science specialists, competence in data science. A systematic scoping review of the literature was conducted across major academic databases and complemented by an expert-informed literature identification strategy. The review examined how data science is described conceptually, how it is structured within school curricula and teacher education, and what knowledge and practices are emphasised for teachers. Findings indicate that while core processes and practices of data science, such as problem formulation, data preparation, exploratory analysis, modelling, visualisation, and ethical engagement, are widely recognised, their translation into teacher education is inconsistent and often lacks coherence. In response, the paper presents a conceptual framework designed to support pre-service teachers in engaging with the processes and practices of doing data science. The framework offers a flexible, practice-informed structure that is accessible to non-specialist teachers and aligned with pedagogical decision-making in educational settings. The paper concludes by discussing how the framework, alongside practical considerations for enactment, can support the preparation of data-literate teachers capable of fostering critical, ethical, and inquiry-based engagements with data in schools.

Keywords:

data science education; initial teacher education; conceptual framework; teacher educators; pre-service teachers; data literacy

1. Introduction

Data Science has emerged as a critical interdisciplinary field that connects statistics, computer science, and application sciences (Biehler et al., 2018). It permeates all aspects of modern society, influencing domains such as healthcare, finance, education, and public policy. The rapid expansion of data availability and the increasing reliance on data-driven decision-making necessitate the development of data literacy skills among citizens. As the European Union’s “A Europe Fit for the Digital Age” campaign highlights (European Commission, 2019), fostering data competence is essential for democratic participation, informed decision-making, and responsible citizenship.

Despite the growing relevance of data science in K-12 education, there remains a significant gap in preparing teachers to effectively integrate data science into their curricula (Gould, 2017, 2021). With the relatively recent emergence of data science (Donoho, 2017), there are additional challenges placed on teachers including, but not limited to, the need for context knowledge, content and pedagogical content knowledge for teaching stochastics, the lack of availability of instructional materials and the need for access to and upskilling in technological tools to support the development of students’ data science understanding (Batanero et al., 2011; Leavy et al., 2025; Wilkerson et al., 2025). Without adequate support and preparation, teachers may struggle to equip students with the necessary skills to critically analyse, interpret, and use data in meaningful ways. Addressing this gap is particularly urgent given the prevalence of misinformation, fake news, and the role of data in contemporary issues such as sustainability, migration, poverty and global pandemics.

The DataSETUP project responds to these challenges by promoting data science literacy in schools through enhanced teacher education. Specifically, the project seeks to integrate data science education into pre-service teacher preparation across multiple disciplines at the university level. By doing so, it intends to build teachers’ capacity to incorporate data-driven approaches in their teaching and foster students’ competencies in working with data. The project aligns with key European initiatives, including the European Digital Skills Agenda (European Commission, 2025) and the Digital Competence Framework for Educators (DigCompEdu) (Redecker, 2017) framework, which emphasise the importance of digital competence for educators.

2. Importance of Incorporating Data Science in Teacher Education

Data Science is increasingly recognised as a fundamental component of 21st-century education, offering powerful tools for problem-solving, critical thinking, and decision-making. As data becomes an essential resource in all fields, equipping students with the ability to understand and analyse data is crucial for their future success (Ridgway, 2022). Consequently, it is imperative that teacher education programmes integrate data science to ensure that pre-service teachers develop the knowledge and skills necessary to teach data literacy effectively.

For pre-service teachers, particularly those specialising in STEM disciplines, a strong foundation in data science is indispensable. Research suggests that such a foundation enables them to develop a deep understanding of data-driven reasoning, statistical thinking, and computational analysis (Ridgway, 2016, 2022) by incorporating the skills necessary to integrate data analysis, simulations, and visualisation tools into their teaching practices (Chance et al., 2007; Finzer, 2013; Wilkerson et al., 2025). It supports these future teachers to engage students in authentic, inquiry-based learning experiences that reflect real-world data applications (Gould, 2017; Weiland & Engledowl, 2022) and foster students’ ability to critically assess data sources, interpret findings, and make informed decisions (Friedrich et al., 2024; Schreiter et al., 2024).

Failure to integrate data science education into teacher preparation risks leaving educators unprepared to navigate the evolving landscape. Moreover, it may result in students lacking essential competencies for participation in a data-driven society. By embedding data science into teacher education, we can ensure that future educators are equipped with the necessary content knowledge, pedagogical strategies, digital tools, and conceptual understandings to teach data science effectively.

Despite growing recognition of the importance of data science in education, there is no coherent conceptual framework to guide how data science should be taught in initial teacher education. Existing frameworks and curricula largely target K-12 learners or focus on disciplinary perspectives such as statistics or computer science, leaving teacher educators without clear guidance on the processes and practices that pre-service teachers need to engage in to become competent teachers of data science. This paper addresses this critical gap by presenting a conceptual framework specifically designed for teacher education—one that articulates the processes, practices, and competencies required for pre-service teachers to learn how to do data science and, in turn, support their future students. The framework developed through the DataSETUP project thus offers a much-needed foundation for structuring teacher education programmes, developing professional learning materials, and ensuring that pre-service teachers are prepared to foster meaningful data literacy in a data-intensive world.

3. Methods

Development of the conceptual framework was informed by a research design consisting of three integrated components. The first two components, a systematic scoping review and expert-elicitation search, generated a body of readings. The third component, a structured analytic process, identified the key learnings from the readings to inform the development of the conceptual framework.

3.1. Systematic Scoping Review

The primary goal of this systematic literature review was to inform the development of the conceptual framework for data science in teacher education. It focused on research on data science as it pertains to data science literacy in education, existing data science curricula and frameworks, and data science in teacher education. Two separate but complementary searches were conducted:

▪: Data science education and literacy;
▪: Pre-service teachers’ data literacy.

Using a comprehensive set of search terms and filters, searches were undertaken across the following academic databases: ERIC (via EBSCO and ProQuest), Science Direct, and Web of Science. The searches were limited to English-language publications from 2013 to 2023. A variety of search phrase combinations were applied across these databases to ensure broad coverage (e.g., “data literacy” AND “teacher education”; “data science” AND “education”). This resulted in an initial pool of 1,531 articles. The search results are presented in Table 1.

Duplicate records (n = 327) were removed using Zotero (version 6.0) reference management software, resulting in 1204 unique records for screening. Title and abstract screening was guided by predefined inclusion/exclusion criteria. Screening was conducted by three reviewers, who began by coding a shared sample of 30 articles to ensure alignment. Thereafter, the reviewers independently screened the remaining articles. Screening inclusion criteria differed slightly for the two search strands:

For “data science education/literacy,” studies were included if they: (1) reported on data science education or literacy; (2) involved pre-service or in-service teachers; (3) ideally referenced a framework, model, or curriculum; and (4) were situated in an educational context.

For “pre-service teacher data literacy,” studies were included if they: (1) reported on data literacy; (2) involved teacher education students; (3) ideally referenced a framework, model, or curriculum; and (4) were situated in an educational context.

This screening yielded 134 papers for full-text review. A further 21 were excluded following full-text reading due to:

Focus on historical policy accounts;
Emphasis on bespoke data management or software evaluation;
Narrative opinion pieces without empirical data;
An exclusive focus on computational thinking or programming without reference to data science.

All full-text reviews were completed by research teams working in pairs or small groups.

3.2. Expert-Informed Literature Identification Strategy

To complement the database-driven search, an expert-informed literature identification strategy was employed. This drew on the practice commonly used in Delphi studies involving systematically consulting recognised experts to identify influential literature in emerging. Expert input is particularly valuable in rapidly developing areas such as data science where research may be dispersed across disciplines and not yet fully indexed in academic databases and is particularly useful when developing curricula and learning experiences (Green, 2014). To generate the expert pool, members of the research team nominated individuals with established expertise in data science or statistics education. A total of 22 experts were contacted via email and invited to “share with us what you perceive as the three most interesting and relevant papers about data science literacy or data literacy in teacher education (only for teachers or pre-service teachers), including any papers you may have authored yourself.” Responses and associated papers were collated. Further hand searches were carried out in selected journals of relevance, including ZDM, Statistics Education Research Journal (SERJ), and The Journal of Statistics and Data Science Education.

These combined efforts from the expert-elicitation search and the hand search of journals yielded an additional 88 studies. Of these, 27 were duplicates (already identified in the initial pool), and 39 were excluded based on the same inclusion/exclusion criteria described earlier. The remaining 22 were included for full analysis, bringing the total number of studies analysed to 135. Figure 1 illustrates the PRISMA flow diagram outlining the identification, screening, and inclusion process for studies sourced through the systematic literature review (outlined in Section 3.1), as well as those added via hand-searching and expert recommendations (as described here in Section 3.2).

3.3. Structured Analytic Process

A set of guiding questions were developed to guide researchers in extracting critical insights from the literature to inform the conceptual framework. The questions were designed to foreground disciplinary definitions, required prior knowledge, core data science concepts, and curricular approaches. They were intentionally broad to accommodate variation across the literature, yet specific enough to generate comparable insights:

▪: Question 1: How is data science described and defined?
▪: Question 2: What is the most important prior knowledge required for pre-service teachers?
▪: Question 3: What is the important data science concepts that preservice teachers need to know?
▪: Question 4: How are approaches to teaching data science in curricula (school-level, undergraduate and initial teacher education) structured?

Guided by these questions, researchers read, summarised, and discussed the 222 extracted papers. Although all 222 papers were read, 87 studies were excluded at various stages during this process, leaving 135 for final synthesis. Each researcher team produced a written summary of findings, which were then integrated, compared, and refined collaboratively. Through iterative discussion, areas of convergence across the analyses were identified, leading to the development of two overarching themes that structure the presentation of findings in the following section. Approximately 60 papers of the 135 paper analysed made a unique and direct contribution to one or both themes and are cited in this paper. The remaining 75 studies offered supportive or contextual insights but did not contribute novel analytical findings.

The four guiding questions shaped the analytic process and are used to synthesise findings across the reviewed studies and frameworks. Questions 1 and 2 which concern how data science is described and what prior knowledge pre-service teachers require were conceptually aligned and therefore informed the first major analytic theme: How Data Science is Described in the Literature. Questions 3 and 4 which focus on identifying essential data science concepts for teachers and examining how data science is structured in school, undergraduate, and teacher education curricula, naturally converged into the second analytic theme: How Data Science is Described and Structured in School Curricula and Initial Teacher Education. This amalgamation of four lines of inquiry into two higher-order themes reflects the way the literature clusters conceptually, allowing for a more integrated and coherent synthesis of findings.

3.4. Designing the EDUCATE Framework

The development of the EDUCATE framework followed a multi-phase, iterative process grounded in the findings of the systematic literature review. After identifying ten core components of data science practice (described in Section 4.1), the research team engaged in several rounds of categorisation, drawing on established frameworks in both data science (e.g., H. Lee et al., 2022; InSTEP [https://instepwithdata.org] accessed on 1 January 2026) and statistical inquiry. We examined how these components could be meaningfully grouped, seeking to identify recurring sequences or cycles of activity that define data science engagement.

This process led to the formulation of four overarching processes (Problem Formulation, Preparation, Analysis, and Interpretation & Communication) that represent the structural phases of a data science investigation. The remaining components were conceptualised as nine cross-cutting practices (e.g., computational thinking, ethical reasoning, and data visualisation) which support and enrich each stage of the process. This distinction was strongly informed by H. Lee et al. (2022), whose framework provided a clear model for differentiating between structural processes and embedded skills or habits of mind.

Early drafts of the model explored cyclical and nested structures, linking specific practices to particular processes. However, the team ultimately chose to represent the practices as flexible and non-linear. This structure preserves conceptual clarity while allowing teacher educators and pre-service teachers to adapt the framework to local contexts. Importantly, the EDUCATE framework is tailored to the needs of non-specialist K–12 educators. While it draws on professional data science models, we intentionally filtered out components that demand high levels of technical expertise or infrastructure (e.g., data infrastructure, web scraping). Instead, we retained elements that are both pedagogically meaningful and developmentally appropriate (for example, data ethics, visualisation, and basic statistical reasoning) thereby aligning with existing school curricula and teacher capabilities.

In addition to the core model, we identified four pedagogical considerations that influence how data science is taught in classrooms. These considerations were informed by literature and curriculum analysis and also reflect our acknowledgement of the broader ecological system in which teaching occurs (cf. Bronfenbrenner & Morris, 2007). While not conceptual elements of data science itself, they serve as practical supports for implementation, helping educators plan for real-world constraints and opportunities in their specific settings. For this reason, they are presented separately in the “Beyond the Framework” Section 6.

4. Findings

The first analytic theme, How Data Science is Described in the Literature, describes ten core components that recur across a wide range of data science frameworks and publications. This section is intentionally descriptive and conceptual in nature, aiming to provide readers with a coherent sense of the processes, practices, and forms of work involved in “doing” data science. As these components are consistently represented across the literature, they are presented here as an integrative synthesis rather than through attribution to individual sources. However, to enhance transparency and support traceability, Table 2 maps each component to representative sources from the literature reviewed, highlighting areas of convergence and variation across emphases.

In contrast, the second theme, How Data Science is Described and Structured in School Curricula and Initial Teacher Education, draws explicitly on empirical studies, curricular frameworks, and policy-oriented publications. Accordingly, this section is more heavily referenced, as it reports how data science is interpreted, prioritised, and enacted within specific educational contexts. The distinction between the two sections reflects a deliberate analytic move: the first establishes a conceptual foundation for data science practice, while the second examines how this foundation is translated into educational structures for learners and pre-service teachers.

4.1. How Data Science Is Described in the Literature

Data science, often described as the “Science of learning from data” (Donoho, 2017, p. 748), is an interdisciplinary field that draws on mathematics, statistics, computer science, and domain-specific knowledge to generate insights from data. While the field remains variably defined (Gould, 2021), conceptual frameworks of data science consistently attempt to articulate the processes and practices involved in doing data science, from problem formulation through to analysis, interpretation and communication. Following a comprehensive review of data science frameworks and the related literature describing professional data science practice, ten core components were identified that recur across these descriptions. These components are presented as a synthesis of the work and practices involved in data science, rather than as a prescriptive or exhaustive taxonomy. As such, they provide a conceptual foundation for understanding what it means to engage in data science practice and form the basis for examining how these practices are taken up, adapted, and structured within educational contexts.

4.1.1. Problem Formulation and Question Development

Access to open, large, messy and complex data sets poses challenges for the development of research questions that are sufficiently focused and well defined. Developing this skill ensures that data science projects focus on solving relevant problems that can be effectively answered given the available data, account for the complexity of the data without oversimplifying or overcomplicating the problem, extract actionable insights and take consideration of the need to navigate the ethical and broader impacts of the research.

4.1.2. Data Collection, Preparation and Processing

The shift toward use of real-world messy data necessitates skills in data processing. Consideration in the literature is given to sources of data such as surveys, databases, web scraping, sensors, pictures, text, GPS, maps and audio as data. Attention is also given to the format of data with structured (e.g., databases) and unstructured formats (e.g., video, text, images) mentioned. Frameworks also attend to data collection methods such as experiments, surveys and observational studies to collect data. The cleaning of data to ensure quality is emphasised through references to handling missing values, removing duplicates, correcting errors and considering outliers. Equally receiving consideration is the need to transform data through normalising or scaling data and integration of data from different sources.

4.1.3. Data Infrastructure and Big Data Technologies

These technologies are essential to data science because they enable the storage of data using databases and data warehouses, the processing of data, and cloud computing which offers cost-effective resources to handle computationally intensive tasks.

4.1.4. Data Governance, Ethics and Responsible Use

This requires critical evaluation of data sources and consideration of the reliability, biases, and ethical implications of using data from different sources. Attention is needed to ethical consideration to counteract bias, both personal and cultural bias alongside biases inbuilt into data sets. Attention to data governance policies relating to rules for data access, data sharing and data retention is critical, including data management and use of best practices for data storage and management. The use of open data requires consideration of the ethical implications of using public data and the use of data for social good can develop appreciation for the projects that use data for positive social impact. Where students collect their own data, they need to develop understanding of informed consent, anonymization, and respecting participants’ privacy.

4.1.5. Exploratory Data Analysis (EDA)

Broadly, exploratory data analysis approaches focus on the selection, calculation and interpretation of appropriate descriptive statistics (e.g., measures of centre, variability and correlation) to summarise and describe data, correlational analysis to explore relationships within the data, hypothesis testing, inference and pattern detection. Many of these exploratory data analytic techniques have their roots in statistical analysis and were originally developed within the discipline of statistics.

4.1.6. Data Visualisation and Visual Analytics

Data visualisation is a critical component of data science, enabling the representation of complex datasets through graphical means to facilitate understanding, pattern recognition, and decision-making. Unlike traditional statistical inquiry, where visualisation is primarily used to confirm hypotheses and summarise results (e.g., histograms, box plots, scatterplots), data science leverages visualisation as an interactive and dynamic tool for exploration. In data science, visualisation is frequently employed to identify patterns, anomalies, and relationships in large and unstructured datasets, often integrating real-time or multidimensional data representations that go beyond conventional statistical graphs. Data visualisation is closely connected to advancements in technology, enabling the use of interactive tools, real-time dashboards, and AI-driven analytics to enhance data exploration. It also leverages simulation and dynamic graphs, allowing users to model complex scenarios, test assumptions, and engage with data in ways that static graphs cannot. These technological integrations make visualisation a powerful medium not just for analysis, but also for communication and decision-making in data science.

4.1.7. Computational Thinking, Modelling, Algorithms, Machine Learning

Computational thinking involves breaking down complex problems into manageable parts, recognising patterns, and developing algorithms to process and analyse data efficiently. Key practices include decomposition, where data scientists dissect problems into smaller tasks; pattern recognition, which involves identifying trends or anomalies in data; and abstraction, which focuses on filtering out irrelevant information to highlight core aspects of the data. Additionally, algorithmic thinking is essential for designing step-by-step procedures that enable data analysis. Programming languages like Python and R play a crucial role in applying these practices, providing tools and libraries that allow data scientists to automate processes, handle large datasets, and implement complex analyses effectively. Machine learning comes into play here as it supports the development and application of algorithms that allows computers make automated predictions from the data. Modelling is the process of creating a mathematical representation of a real-world situation using data. A model can be a mathematical equation, a statistical algorithm, or a machine learning algorithm that describes how different variables in the data relate to each other.

4.1.8. Statistical and Mathematical Foundations

Many definitions and frameworks retain a strong focus on probability theory and statistical inference alongside a foundation in linear algebra and calculus (important for algorithmic understanding and machine learning). For example, Bayesian inference, a fundamental concept in probability, is widely used in machine learning for updating beliefs in light of new data. Similarly, probabilistic models rely on probability theory to identify patterns, make predictions, and handle uncertainty in data-driven applications.

4.1.9. Domain Knowledge and Interdisciplinary Collaboration

The contextual relevance of data science is emphasised through the focus on collaborating with domain experts to ensure accuracy of data insights. Collaboration within the context of working in multidisciplinary and cross-functional teams is essential in data science, as real-world data problems require expertise from multiple fields to ensure robust analysis, ethical considerations, and meaningful interpretation. Disciplines such as computer science, mathematics, and statistics contribute technical foundations, while domain-specific fields like healthcare, finance, education, and environmental science provide context and application, and ethics, social sciences, and philosophy help address issues of bias, fairness, and societal impact.

4.1.10. Collaboration and Communication Skills

The importance of effective teamwork and communication is evident across many frameworks which identify the need to develop students’ collaborative and communication skills in order to prepare them to work with diverse teams, to clearly and concisely communicate findings from complex data sets, to document their data science process, to interact with stakeholders to define project goals and to communicate clearly, to use the language of uncertainty when there is no clear right or wrong and collaborate in making responsible decisions when addressing ethical concerns around data privacy, bias, and fairness in data science projects. This incorporates attention to data communication through the use of comprehensible report writing and storytelling with data.

4.2. How Data Science Is Described and Structured in School Curricula and Initial Teacher Education

The components outline in the previous section describe the core processes and practices involved in doing data science, as they are represented across professional and conceptual accounts of the field. However, the ways in which these components are interpreted, prioritised, and structured within educational contexts, particularly at school level and in initial teacher education, are neither uniform nor comprehensive. The following section therefore examines how these core components of data science are taken up within curricula and frameworks designed for learners and pre-service teachers. Drawing on empirical studies, curricular frameworks, and programme descriptions, this analysis explores how data science processes and practices are translated into educational structures, highlighting areas of alignment, emphasis, and omission in school curricula and teacher education.

Our review of data science conceptual frameworks and the research literature reporting on data science courses for school level students, teachers and other undergraduate students, points towards components considered necessary for non-data science majors. There are a small number of data science frameworks specifically designed for educators, including schoolteachers and higher education instructors. These frameworks aim to integrate data science principles into the classroom, helping teachers guide students in understanding data, developing questions, utilising computational thinking skills, and applying data-driven decision-making, amongst other emphases. A feature common to many of these frameworks is a combined emphasis on data science processes and data science practices. These broadly align with the components of data science described in the previous section.

The reviewed frameworks vary not only in structure but also in the grade levels they target, and this shapes the emphasis placed on different components of data science. For example, the IDS and IDSSP frameworks, developed for high school learners, engage with a broad range of data science processes, including computational modelling, algorithmic thinking, and big data technologies. In contrast, the Ow-Yeong et al. (2023) framework, designed for elementary mathematics education, places greater emphasis on problem formulation, real-world data contexts, exploratory data analysis, and visualisation, as well as ethical awareness and data storytelling. The activities described in their framework are closely tied to curriculum-aligned content (e.g., bar graphs, pictographs) but framed through the lens of open, authentic inquiry using real data. Notably, components such as machine learning, infrastructure, or advanced modelling are absent, reflecting age-appropriate pedagogical choices. This distinction reinforces the importance of aligning data science education with learners’ cognitive and developmental readiness and illustrates how the components in Theme 1 may be differently prioritised across educational levels.

4.2.1. Data Science Processes

Data science processes are “the steps taken in order to achieve full understanding of data science procedures, including formulating questions, collecting data, analyzing data (visuals and models), and communicating results” (Thompson & Arastoopour Irgens, 2022, p. 30). This emphasis is visible in many of the frameworks that represent the data science process as a structured, systematic sequence of steps or stages to follow in order to solve a data-driven problem, i.e., a workflow that guides the entire lifecycle of a data science project. In the field of statistics education, there have been several frameworks consisting of four-phase (Franklin et al., 2007; Graham, 1987) and five-phase statistical inquiry cycles (Watson et al., 2017; Wild & Pfannkuch, 1999) representing approaches to solving problems with data. Examination of data science frameworks, described below, suggests broad alignment with these statistical inquiry cycles.

The Introduction to Data Science (IDS) curriculum (Gould et al., 2022), designed for high school students, covers key data science concepts such as data collection, cleaning, visualisation, analysis and real-world data projects that engage students in hands-on learning. A representation of a four-phase data cycle is used in the IDS curriculum to illustrate some data science processes (see Figure 2). The cycle of learning from data is also placed at the centre of data science (Figure 3) in the International Data Science in Schools Project (IDSSP) (IDSSP Curriculum Team, 2019) an initiative aimed at developing a comprehensive framework and curriculum for teaching data science in schools worldwide. In contrast, the outer circle of Ow-Yeong et al.’s (2023) depiction of data science describes the data science process (see Figure 4); these broadly align with the IDS and IDSSP data cycles. However, this framework incorporates specific reference to data science practices and locates them within mathematics and statistics (e.g., probability theory), computer science (e.g., data processing, programming, algorithms) and domain applications (e.g., sciences, public policy). The Data Science Ethos Lifecycle (Boenig-Liptsin et al., 2022) presents a six-stage data science workflow that incorporates a focus on exploratory data analysis and the use of analytical tools such as modelling (Figure 5). Similarly, the six phase data investigation process proposed by H. Lee et al. (2022) (Figure 6) incorporates the consideration of models. The authors emphasise that the distinguishing feature of models within the context of data science is that data scientists choose “specific models as evidence to support claims that address an investigative question, often discarding models that do not help answer a question” (p. 13). The authors also make efforts to communicate the, at times, nonlinear and dynamic nature of the work that can occur simultaneously within and between phases.

These frameworks emphasise the data science process, which consists of steps closely aligned with well-established statistical inquiry cycles. In addition, aligned with these steps they incorporate several key data science components outlined in the previous section (Section 4), notably Data Preparation and Processing (Collect Data), Exploratory Data Analysis and Data Visualisation (Analyse Data), and Communication Skills (Communicate Conclusions). These components are identified also in research examining and critiquing data science education for pre-collegiate (Adisa et al., 2024; LaMar & Boaler, 2021; V. R. Lee & Delaney, 2022) and undergraduate (non-data science majors) students (B. Baumer, 2015; Li et al., 2023; Yan & Davis, 2019). Arising from her review of over 250 peer-reviewed articles, book chapters, and International Association for Statistical Education (IASE) conference proceedings, Davidson (2024) emphasises that it is the use of investigative projects in statistics and data science courses that allows students to experience the entire cycle of a statistical or data science investigation process.

A key feature shared by data science processes and traditional statistical inquiry cycles, particularly emphasised in frameworks and studies related to teachers, is the importance of developing statistical questions and identifying problems (e.g., the IDS curriculum, the Data Science Ethos Lifecycle, and the IDSSP framework). These frameworks, along with several research studies, highlight the critical role of asking meaningful and relevant questions in educational contexts (LaMar & Boaler, 2021; Leavy & Frischemeier, 2022; V. R. Lee & Delaney, 2022; Yan & Davis, 2019). For example, Dichev and Dicheva (2017) identified the ability to formulate productive questions as an essential skill in their recommended set of competencies for a general education data science course for non-technical students.

4.2.2. Data Science Practices

Compared to data science processes, data science practices are more focused on the day-to-day activities, tools and ethical considerations within each step of the data science process. The practices focus on how tasks are carried out on a granular level, which techniques and tools are employed, and why certain methodologies may be preferred. This relationship between steps in the data science process and the associated data science practices that support those steps is highlighted in the original IDSSP framework, in H. Lee et al.’s (2022) data investigation process and in Ow-Yeong et al.’s (2023) framework. Because the schematics presented in this paper focus on overarching processes rather than detailed practices, we encourage readers to consult the original articles and visual frameworks for a fuller representation of the specific data practices aligned with each phase of the data science cycle.

Data Collection, Preparation and Processing

Data science, by its nature, deals with larger and more complex data sets than those traditionally used in school contexts. These data are often characterised by the multiple Vs (see Kitchin & McArdle, 2016) of volume (enormous amounts), velocity (rapid and real time creation of data), variety (heterogenous nature of data), value (usefulness of data due to the multiple insights that can be gained) and veracity (truthfulness, accuracy and precision), though additional characteristics are also noted. The preparation and processing of such data to make it suitable for analysis provides considerable challenges within the classroom context (Kjelvik & Schultheis, 2019). However, attention to the proper preparation of data has been provided in data science courses for high school students (LaMar & Boaler, 2021), undergraduate liberal arts students (B. Baumer, 2015) and for undergraduate data analytics and statistics students that explore web scraping techniques (Dogucu & Çetinkaya-Rundel, 2021). Attention to data preparation and processing is also incorporated into several frameworks, including the ‘processing phase’ of H. Lee et al. (2022), which encompasses data organising, structuring, cleaning, and transforming. Similarly, the IDSSP framework highlights this aspect in its ‘getting data’ phase, which involves data harvesting and wrangling/munging. The centrality of this data science practice is also evident in the data science framework developed by Keller et al. (2020) which position data wrangling (data profiling, preparation and linkage) in the centre of its framework.

Data Infrastructure and Big Data Technologies

As teachers increasingly engage their students in data science and engage in handling large volumes of educational data themselves, it is important for them to grasp the basics of big data technologies. Indeed, the use of “big data” is becoming more prevalent as evident in a study by Liston et al. (2022) who responding to a brief from elementary students, used an IoT environmental monitoring system to collect, analyse and visualise data on the light, humidity, sound and temperature in the school environment, and Higgins et al. (2022) who engaged 11–14-year-olds in analysis of open source data from the US Centers for Disease Control and Prevention (CDC). Another example is the Mobilize Introduction to Data Science (IDS) curriculum (Gould et al., 2016) which uses an open-source system that supports and manages the flow of publicly available data from mobile devices to the classroom whereupon the data are analysed by students in an effort to understand their community. Studies have also engaged pre-service teachers in analysing big data through analysis of heat maps sourced from publicly available data showing crime statistics in two major cities (Andersson & Register, 2023).

Data Governance, Ethics and Responsible Use

Ethical considerations “lie at the heart of data science” (National Academies of Sciences, Engineering, and Medicine, 2018, p. 30). Across the literature there is growing awareness of the need for teachers and non-DS majors to attend to data governance and ethics in data science because this will equip students with the knowledge and skills to use data responsibly, ethically, and effectively. Furthermore, it helps students understand the broader impact of their work, ensures they handle data with care, promotes fairness and equity, and prepares them for the ethical challenges they will encounter in their professional lives. For teachers, incorporating these topics into the curriculum is vital for developing not just skilled data scientists, but also conscientious and socially responsible professionals. The importance of attending to ethics is emphasised in the literature for undergraduate students who are non-DS majors (Dichev & Dicheva, 2017), alongside teachers of primary (Fry & Makar, 2021), secondary (V. R. Lee & Delaney, 2022) and college (B. S. Baumer et al., 2022; Li et al., 2023) students. The issue of ethics transcends all aspects of data science. For example, the increasing use of machine learning models in education has led to concerns about the reliance on recommendations from such models (Bach et al., 2022) and resulting in efforts to establish the ways in which college students justify the usability of self-built models (Bata et al., 2025).

Awareness of personal bias and how they might impact the questions posed, variables selected, and communication of conclusions has been identified by Wild and Pfannkuch (1999). In relation to liberal arts students, B. S. Baumer et al. (2022) identify important DS ideas in relation to the ethics of data science as: (1) Ethical precepts for data science and codes of conduct, (2) Privacy and confidentiality, (3) Responsible conduct of research, (4) Ability to identify “junk” science, and (5) Ability to detect algorithmic bias. However, there appears to be consensus that data governance and ethics does not receive sufficient attention in college data science programmes and requires continued and focused attention (Oliver & McNeil, 2021). An effort to examine pre-service mathematics teachers’ ethical reasoning in big data carried out by Andersson and Register (2023) indicated that pre-service teachers presented a diverse range of ethical arguments related to data access, which supported their efforts to critically examine oppressive situations. However, their reasoning may be constrained by a limited understanding of data science methodologies, suggesting a need for greater emphasis on these concepts in mathematics teacher education. Evidence to support the role played by data science methodologies, in this case understanding of specific model knowledge, is provided by Lieben and Gürtler (2025) who found that such knowledge increased the accuracy of German 10th grade students’ interpretations of the stochastic variations in simulations when examining scientific epidemic models. However, being a data science major is not sufficient to ensure proper grounding in ethics as identified by Oliver and McNeil (2021) who reviewed 18 undergraduate data science programmes and found they lack sufficient focus on the ethics of data use and misuse.

Data Visualisation

Research on data science education at the school level (K-12) and in teacher education highlights the essential role of data visualisation in fostering data literacy and analytical skills. Several school-level curricula emphasise visualisation as a means of engaging students with real-world data, supporting inquiry-based learning, and enhancing computational thinking (Weiland & Engledowl, 2022). The IDS curriculum (International Data Science in Schools Project) integrates visualisation as a core component, encouraging students to explore and interpret data dynamically rather than relying solely on numerical summaries. Similarly, the Bootstrap: Data Science curriculum (Schanzer et al., 2022) incorporates visualisation as a foundational tool for exploring relationships in datasets and making sense of complex information. Growth in use of “data talks” and data discussion tasks about socially relevant data visualisations (e.g., LaMar & Boaler, 2021; Flavin & Suh, 2024; Wilkerson et al., 2025), alongside use of frameworks to support discussion about data displays (Friel et al., 2001; Thrasher et al., 2024), and data visualisations focusing on important civic issues (e.g., Gapminder, https://www.gapminder.org (accessed 1 January 2026)) have drawn greater attention to the use of visualisation in school statistics. In teacher education, studies have shown that pre-service teachers often lack confidence in their ability to teach data visualisation (Groth & Meletiou-Mavrotheris, 2017) and experience challenges interpreting non-traditional representations (Gonzales, 2025), despite recognising its importance in developing students’ data reasoning skills. However, research also suggests that when pre-service teachers engage in hands-on experiences with visualisation tools and interactive data exploration, their understanding of data science concepts improves, and they become better equipped to integrate visualisation into their teaching practices (Wilkerson et al., 2025). The importance of starting an introductory data science course with visualisation is emphasised by Çetinkaya-Rundel and Ellison (2021), as it leverages students’ intuitive understanding and allows for a gradual transition to more complex statistical concepts. Additionally, visualisation provides immediate feedback, making errors easier to detect compared to tasks like data wrangling or modelling. These findings underscore the need for explicit instruction in data visualisation within both K-12 curricula and teacher education programmes to ensure that future educators can effectively incorporate visual data representations in their classrooms.

Exploratory Data Analysis (EDA)

Research over the past several decades has extensively explored ways to support pre-service teachers in developing statistical reasoning, recognising that many of these skills are foundational to data science, resulting in the development of a strong literature base on teaching and learning of data and statistics (Weiland & Engledowl, 2022). Studies indicate that while pre-service teachers often have familiarity with basic descriptive statistics, they may struggle with deeper conceptual understandings of variability, correlation, and inference, key components of Exploratory Data Analysis (EDA) (Garfield & Ben-Zvi, 2008; Shaughnessy, 2007). Efforts to enhance statistical reasoning have included integrating dynamic data visualisation tools, simulations, and real-world data sets into teacher education programmes to promote a more intuitive and inquiry-driven approach to data analysis (Groth & Meletiou-Mavrotheris, 2017; De Vetten et al., 2023). Importantly, the ability to interpret trends, recognise patterns and make data-driven inferences align closely with the skills required in data science. As such, strengthening pre-service teachers’ EDA competencies not only enhances their statistical literacy but also prepares them to engage meaningfully with data-intensive problems across educational and professional contexts.

Computational Thinking, Modelling, Algorithms and Machine Learning

The importance of Computational Thinking, Modelling, Algorithms and Machine Learning is evident in the recent literature. The role of programming skills in statistics and data science practice is being increasingly acknowledged (Nolan & Temple Lang, 2010); however, Horton and Hardin (2021, p. 51) refer to the “notable gap … between our intentions and our actions”. In this Special Issue dedicated to integrating computing in the statistics and data science curricula, Horton and Hardin (2021) identified three non-mutually exclusive approaches that might be fruitful in this regard: creative restructuring of curricula, the incorporation of novel or technical data science skills into statistics courses (for example, web scraping) and implementing computational thinking skills into courses. Indeed, this ability to think computationally was identified as a key skill by Dichev and Dicheva (2017) in their design of a general education course on data science for non-technical students. The degree to which high level computational or programming skills are required for non-data science majors, however, remains unclear with Overton and Kleinschmit (2022, p. 362), in their description of a Data Science Literacy Framework to incorporate data science principles into public administration programmes, referring to the ‘faulty assumption that data science tasks require statistical and computational sophistication’.

Programming receives mention in several frameworks and curricula for school-level data science. In Ow-Yeong et al. (2023) data science framework, programming is located within the domain of computer science. There is reference to programming in the bootstrap:Data Science curriculum (Schanzer et al., 2022) in the ingredient called ‘Computing’ which also incorporates attention to data acquisition, management and cleaning. Similarly, Weiland and Engledowl (2022) argue for the need to teach programming in the K-12 curricula that is relevant for data wrangling and analysis. Some school curricula at the early years of school education lay the foundations for programming, through the use of visual languages and block-based coding platforms (Datta & Nagabandi, 2017). Focusing on the later elementary years, a study by Thompson and Arastoopour Irgens (2022) with 11–13-year-olds used a combination of nonprogramming activities (e.g., use of google trends to explore and visualise data from public Google searches) and programming activities (using the language R). A study by Schönbrodt and Franke (2025) explored how the foundational principles of machine learning can be mapped onto mathematics curricula. Focusing on classification problems and Support Vector Machines (SVM), they examined how key mathematical concepts, such as distances and the dot product, can be introduced through structured learning trajectories in secondary education. However, despite the increasing recognition of computational thinking, modelling, algorithms, and machine learning in data science education, many educators and learners still face challenges in acquiring the necessary competencies. The gap between the intended integration of programming skills and their actual implementation in statistics and data science education, as noted by Horton and Hardin (2021), underscores this need. Similarly, Msweli et al. (2023), in a scoping review of data science education, identified a lack of competencies in working with data platforms, models, and tools as a key challenge in data science education. Addressing these gaps requires continued efforts to embed computational thinking and programming into curricula in ways that are accessible and relevant for both pre-service teachers and students at various educational levels.

Domain Knowledge and Interdisciplinary Collaboration

The need to emphasise domain knowledge and interdisciplinary collaboration is a key component of data science. The need for domain specific education was emphasised by Oliver and McNeil (2021) in their evaluation of undergraduate data science programmes at a subset of 4-year institutions in the United States. The authors acknowledged the degree to which institutions were addressing communication in data science courses. The importance of developing teamwork and collaboration skills in data science courses was outlined by Vance (2021) who recommended a pedagogical strategy called Team Based Learning that enhance these skills and also acknowledged by Wu et al. (2023, p. 626) stating that “data science requires a multidisciplinary approach. Not only does it need to be closely tied to data-driven technologies, but it also needs to represent the contributions of different disciplines in data science objectively.”

Collaboration and Communication Skills

The data science literature and associated frameworks emphasise the critical role that communication and collaboration skills play in the life of a data scientist (Kauermann & Seidl, 2018; National Academies of Sciences, Engineering, and Medicine, 2018; H. Lee et al., 2022) with Roseth et al. (2008, p. 1) aptly stating that “Collaboration is not just an end goal of statistics instruction but also a means to help students learn statistics.” Consequently, approaches such as small group cooperative learning approaches (Kalaian & Kasim, 2014; Roseth et al., 2008) and team-based learning have been adopted within statistics and data science courses (Vance, 2021). The importance of communicating results and making evidence-based claims is embedded in all data science frameworks (e.g., IDSSP framework, H. Lee et al., 2022).

In conclusion, the analysis presented in this theme demonstrates that while many data science curricula and frameworks emphasise common processes and practices, there is considerable variation in how these are structured, prioritised, and pedagogically enacted. This is particularly the case for learners who are not data science majors. Across school curricula and initial teacher education, core components of data science are often introduced in fragmented ways, with limited guidance on how pre-service teachers might coherently experience and integrate the processes and practices involved in doing data science. These findings highlight the need for a conceptual framework that is explicitly designed for teacher education: one that foregrounds data science practice, supports non-specialist learners, and provides a clear structure for engaging pre-service teachers in authentic data science activity. The EDUCATE framework was developed in response to this need.

5. The EDUCATE (Empowering Data Science Understanding for Teacher Education) Conceptual Framework

The EDUCATE framework draws directly on the core components of data science identified in Theme 1 and the curricular patterns and challenges identified in Theme 2, translating these insights into a pedagogically oriented structure for initial teacher education. The framework is targeted at two audiences:

Preservice teachers. Preservice teachers do data science from a learner’s perspective. For this audience, the framework is about doing data science.
Instructors of preservice teachers. For this audience, the framework is about teaching and doing data science. They are teaching preservice teachers and bringing them through the activities that involve doing data science.

While the EDUCATE framework outlines core dimensions of data science practice and pedagogy, it is not intended as a one-size-fits all model for K-12 education. Instead, it is design to support pre-service teachers and teacher educators in understanding and implementing data science in developmentally appropriate and context-specific ways. The relevance and application of each component will vary, naturally, by school grade level. For example, teachers of upper primary students may engage them in data collection visualisation and ethical -reasoning using real world contexts, while secondary students may explore more advanced topics such as modelling, algorithmic thinking, or predictive analysis. Consequently, the framework is intended to be adaptable and provide a structure to guide pedagogical planning.

The framework (see Figure 7) describes the four-component PROCESS for doing data science and presents the data sciences processes and aligned data science practices. Its intent is to support the design of learning experiences for preservice teachers that guide them in navigating both the processes and practices involved in doing data science, enabling them to develop a coherent and integrated understanding of data-driven inquiry.

The data science process, tailored for pre-service teachers, is a streamlined four component process. Data science is not seen as a cyclical process, but rather as four components that interconnect and do not have a fixed sequential order: Get and explore data → formulate problem → model/analyse data → communicate results and action plan.

The nine data science practices are aligned with the four components of the data science process with many of the practices occurring during all of the processes. The practices focus on how tasks are carried out on a granular level, which techniques and tools are employed, and why certain methodologies may be preferred. The data science practices associated with the data sciences processes are described in Table 3.

Get and Explore Data: This process involves gathering data either firsthand or by sourcing pre-existing data sets (often large and open source). Where pre-existing or secondary data are sourced, exploration of the data set also occurs here to gain insights into the structure of the data (e.g., identification of variables, types of data, distributions, visualisations).
Formulate the Problem: This process involves defining and framing the specific question or problem that the data science project aims to solve. Problem formulation is usually the first step in the data science process which then drives the gathering of data. Teachers, however, often source publicly available data sets for use in classroom contexts and in these situations, problem formulation may occur after the data have been sourced or explored.
Model/Analyse the problem: This process involves exploring data to uncover patterns, trends, and insights relevant to the problem. Where relevant, pre-service teachers are introduced to predictive modelling techniques that can be applied to reveal insights into the data.
Communicate Results and Engage in an Action Plan: The final process involves creating a narrative that explains the key findings and subsequently translating these data insights into actionable recommendations for self and society.

Table 3 is a synthesis of ten components that describe the nature and scope of data science practices, based on an integrative review of existing frameworks, curricula and empirical literature. While these components reflect broadly agreed-upon aspects of professional data science work, the table is not a direct extraction from the literature. Instead, it represents an amalgamation of research-based insights and reasoned judgement by the research team, guided by a commitment to educational relevance. Each component was carefully reviewed to determine its suitability and adaptability for use in school-based data science instruction, particularly in primary and secondary contexts.

Some elements were judged to have limited direct transferability to school settings due to their reliance on complex tools, technical expertise, or infrastructure that may not be available in educational environments. Conversely, other areas such as problem formulation, exploratory data analysis, and data ethics were not only well-supported in the literature but also highly relevant and feasible for classroom implementation, and thus were emphasised. The resulting table provides a conceptual bridge between professional practice and school-level adaptation. It foregrounds the core practices that teachers need to understand and help students engage with, without assuming specialist training.

Positioning the EDUCATE Framework in the Landscape of Data Science Education

The EDUCATE framework builds upon and extends existing models of data science practice. In contrast to other influential frameworks designed for professional or upper-secondary learners (H. Lee et al., 2022), the EDUCATE framework was explicitly developed for use within initial teacher education, targeting pre-service teachers as non-specialist learners who are preparing to teach data science at the school level. To meet the needs of this audience, in contrast to frameworks comprising five or six distinct steps (Boenig-Liptsin et al., 2022; H. Lee et al., 2022; Ow-Yeong et al., 2023), EDUCATE adopts a streamlined structure comprising four core processes that span the data science cycle. This simplification is intentional: by reducing the cognitive and technical demands placed on users, the framework aims to make data science more accessible to those without formal training in statistics, computer science, or data science.

A further point of differentiation lies in the treatment of data science practices. Unlike several other models that tightly link specific practices (e.g., data wrangling, modelling, or visualisation) to a discrete data science process, the EDUCATE framework intentionally does not align specific practices with specific processes. This design reflects the recognition that data science investigations are highly contextual, and that particular practices may be utilised in several of the data science processes in a project. This flexible approach supports recognises the iterative, non-linear nature of real-world data inquiry.

Lastly, the EDUCATE framework emphasises the development of action-oriented outcomes. Like H. Lee et al. (2022), it includes a final process focused on communication and action, a step often missing in other frameworks. This reflects a growing understanding that data science, especially in educational contexts, should not end with interpretation but should inform decision-making and engagement with real-world issues. Together, these features (accessibility for non-specialists, flexibility in practice-process alignment, and a focus on purposeful action) position the EDUCATE framework as a novel and pedagogically grounded contribution to the field of data science education.

6. Beyond the Framework: Practical Considerations for Implementation

While the EDUCATE framework outlines the core processes and practices of doing data science, additional practical considerations are essential for those planning to teach data science in real classroom settings. This section is intended primarily for pre-service teachers (PSTs) who are beginning to think about how they might design and deliver a data science module in their future classrooms. It may also support teacher educators seeking to help PSTs bridge the gap between conceptual understanding and classroom practice. A recurring question for PSTs—What do I need to consider when teaching data science in my classroom?—is an important question, as data science offers pedagogical challenges and possibilities that differ in important ways from traditional statistics instruction. Embracing data science often requires a shift in mindset, one that involves greater openness to complexity, uncertainty, and the messiness of real-world data. These considerations are not intended to replace or extend the framework, but rather to support pre-service teachers in making informed pedagogical decisions when enacting the framework in classroom settings.

At the school level, the most immediate difference is the nature of the data used. Unlike the tidy, clean, and often contrived data sets used in traditional statistics lessons, data science frequently engages students with large, messy, multi-variable, real-world datasets. These data may require preparation, cleaning, and transformation before they are even ready for analysis; tasks that are not typically part of school-level statistics instruction but are central in data science. Teachers must therefore make careful choices about how to scaffold students’ experiences with such data while still maintaining cognitive and curricular coherence. The INSTEP programme at NC State (https://instepwithdata.org/public/about/ accessed 1 January 2026), and seminal publications such as Ben-Zvi et al. (2017), incorporate several of the considerations discussed below. Building on that model and the outcomes from our review, we present four key pedagogical aspects that pre-service teachers and teacher educators should reflect on when planning, teaching, and evaluating a data science activity. Alongside each consideration, we offer an illustrative example of how it may translate into classroom decision-making.

6.1. Context

Real-world contexts are at the heart of data science teaching. They not only ground the content in students’ everyday lives but also demonstrate the relevance and power of data science for understanding the world. Teachers should select contexts that are authentic, engaging, and meaningful for their students, whether drawn from: Sports analytics, environmental data and climate change; social media and digital trends; health and wellness; civic data (e.g., voting, local issues); social justice, equity, and sustainability; or, economics, food, and agriculture. The choice of context can affect student motivation, identity, and participation in powerful ways particularly when connected to issues students care about. For example, a 6th grade class might analyse local weather data as part of an integrated science and mathematics unit, while a secondary class might explore social media trends to examine digital behaviours or misinformation. Aligning the context of these data science tasks with students’ lived experiences and curricular goals increases relevance, and subsequently, engagement.

6.2. Key Ideas in the K–12 Curriculum

While data science is broader than statistics, statistical thinking remains foundational. PSTs must be able to recognise and build on key statistical concepts that appear in national or regional curricula. These may include types of data (categorical, numerical), data collection methods and sampling, descriptive statistics (mean, median, variability), data visualisation, distributions and patterns, relationships and correlation, inferential reasoning and prediction, concepts of bias and uncertainty. By connecting data science activities to existing curriculum content, teachers can more easily integrate them into lessons and assessment practices, while also extending students’ conceptual understanding.

In elementary settings, students might learn to identify categorical vs. numerical data using classroom surveys or exploring weather data. In secondary classrooms, they could engage in exploratory analysis of sports analytics or economics datasets using scatterplots and begin to model relationships using regression or informal inferential reasoning.

6.3. Tools

Data science instruction is often supported by digital tools. Tools must be chosen carefully and should be developmentally appropriate, accessible, and relevant to the learning goals. Some tools support data visualisation and exploration, others enable statistical analysis, while others are useful for programming, cleaning, or modelling data. Useful tools may include spreadsheets (e.g., Excel, Google Sheets), programming languages (e.g., R, Python with Jupyter Notebooks), visualisation platforms (e.g., CODAP, Tableau Public, Datawrapper) and data sources for real-world datasets (e.g., Kaggle, Gapminder, government portals). Teachers do not need to be experts in all tools but should be comfortable enough to guide students and make choices that align with learning outcomes. For younger students, tools such as CODAP offer low-floor entry to data analysis and basic visualisation without requiring programming. In contrast, high school students might use Python (via Jupyter notebooks) to engage with larger or more complex datasets. Choosing such tools balances teacher confidence, cognitive demand with technical accessibility.

6.4. Assessment Practices

Assessment in data science education should capture not just students’ technical skills, but also their ability to think critically with data, communicate findings, and engage in ethical reasoning. Traditional assessments may have a place, but more authentic and performance-based assessments are often better suited. Ideally, assessment strategies should reflect the iterative and investigative nature of data science work, providing space for creativity, revision, and reflection. Possibilities include individual or group projects, written reports or presentations, interactive data visualisations, data interpretation tasks, portfolios, peer-assessment, self-assessments, and rubric-based evaluations of inquiry, reasoning, or ethical decision-making.

In classrooms with younger learners, assessment might focus on interpreting bar charts, making a claim using data or describing patterns in data aloud or in writing. In secondary settings, students might design a data investigation and communicate findings through infographics or data reports assessed using rubrics that value technical skill, reasoning, clarity, and ethical considerations.

By considering these four dimensions—Context, Curriculum, Tools, and Assessment—PSTs and teacher educators can better prepare to implement meaningful data science education. These considerations complement the EDUCATE framework by offering practical guidance on bringing the framework to life in diverse classroom settings. Ultimately, the goal is not only to teach data science content, but also to empower teachers and students to engage thoughtfully, ethically, and critically with data, skills that are increasingly essential in today’s world.

7. Summary and Conclusions

This report has explored the urgent need to embed data science education into initial teacher education. Through a comprehensive systematic literature review and framework analysis, the literature suggests that while data science is increasingly central to societal and professional life, its integration into teacher preparation remains fragmented and underdeveloped. The review highlights key processes and practices of data science that are essential for pre-service teachers, not only to understand data science as learners but also to teach it effectively in school contexts. These include data preparation, exploratory analysis, modelling, visualisation, and sustained engagement with ethical issues and data governance. A notable emphasis across the literature is the importance of domain knowledge, interdisciplinary collaboration, and communication skills, positioning data science as a deeply contextual and socially relevant field.

Drawing from the findings, the EDUCATE framework was developed to support both the doing and teaching of data science in pre-service teacher education. It provides a flexible, practice-informed structure that aligns data science processes with critical pedagogical considerations, offering a pathway for teacher educators to meaningfully incorporate data science into their programmes. The framework acknowledges the complexity and dynamic nature of data science, while remaining accessible and relevant for pre-service teachers who are not data science specialists.

In addition to the conceptual framework, the report highlights key pedagogical considerations for implementation, including context, key curriculum ideas, tools and assessment. These considerations are not intended as instructional prescriptions, but rather as guiding prompts to support pre-service teachers preparing to bring data science into their future classrooms. These considerations also complement the framework by addressing key pedagogical concerns such as selecting meaningful contexts, aligning with curriculum goals, choosing appropriate tools, and assessing student understanding in authentic ways. These considerations are particularly important given the distinctive challenges of teaching data science, especially the use of complex, real-world datasets that differ significantly from those typically used in statistics education.

Taken together, the framework and the accompanying considerations contribute to ongoing efforts to develop data-literate teachers who can foster critical, inquiry-based, and ethically grounded engagements with data in their future classrooms. The EDUCATE framework is intended to be flexible and adaptable, allowing teacher educators to emphasise different processes and practices depending on grade level, curricular alignment, and available resources. In doing so, it provides a shared language and structure for thinking about data science education in schools, while leaving space for professional judgement and contextual adaptation. As the digital and data landscape continue to evolve, supporting teachers to navigate data science thoughtfully and responsibly will remain central to preparing students for participation in an increasingly data-intensive world.

Author Contributions

All listed authors A.L., S.K., S.P. and D.F. have contributed to the study conceptualization; methodology; investigation; data curation; writing—original draft preparation; writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This work is being funded by the EU, under the Erasmus + Key Action 2 program (Project 2022–1 DE01-KA220-HED-000160333-Promoting Data Science Education for Teacher Education at the University level, DataSETUP). Any opinions, findings, and conclusions or recommendations presented in this paper are those of the authors and do not necessarily reflect those of the EU.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data sharing is not applicable.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Adisa, I. O., Herro, D., Abimbade, O., & Arastoopour Irgens, G. (2024). Engaging elementary students in data science practices. Information and Learning Sciences, 125(7/8), 513–544. [Google Scholar] [CrossRef]
Andersson, C. H., & Register, J. T. (2023). An examination of pre-service mathematics teachers’ ethical reasoning in big data with considerations of access to data. The Journal of Mathematical Behavior, 70, 101029. [Google Scholar] [CrossRef]
Bach, T. A., Khan, A., Hallock, H., Beltrão, G., & Sousa, S. (2022). A systematic literature review of user trust in AI-enabled systems: An HCI perspective. International Journal of Human–Computer Interaction, 40(5), 1251–1266. [Google Scholar] [CrossRef]
Baig, M. I., Shuib, L., & Yadegaridehkordi, E. (2020). Big data in education: A state of the art, limitations, and future research directions. International Journal of Educational Technology in Higher Education, 17(1), 44. [Google Scholar] [CrossRef]
Bata, K., Schmitz, A., & Eichler, A. (2025, February 3–7). How students justify the usability of self-built machine learning models. Congress of European Research in Mathematics Education, Bolzano, Italy. [Google Scholar]
Batanero, C., Burrill, G., & Reading, C. (2011). Overview: Challenges for teaching statistics in school mathematics and preparing mathematics teachers. In C. Batanero, G. Burrill, & C. Reading (Eds.), Teaching statistics in school mathematics-challenges for teaching and teacher education: A joint ICMI/IASE study (pp. 407–418). Springer Science+Business Media B.V. [Google Scholar]
Baumer, B. (2015). A data science course for undergraduates: Thinking with data. The American Statistician, 69(4), 334–342. [Google Scholar] [CrossRef]
Baumer, B. S., Garcia, R. L., Kim, A. Y., Kinnaird, K. M., & Ott, M. Q. (2022). Integrating data science ethics into an undergraduate major: A case study. Journal of Statistics and Data Science Education, 30(1), 15–28. [Google Scholar] [CrossRef]
Ben-Zvi, D., Gravemeijer, K., & Ainley, J. (2017). Design of statistics learning environments. In International handbook of research in statistics education (pp. 473–502). Springer International Publishing. [Google Scholar]
Biehler, R., Frischemeier, D., Podworny, S., Wassong, T., Budde, L., Heinemann, B., & Schulte, C. (2018). Data science and big data in upper secondary schools: A module to build up first components of statistical thinking in a data science curriculum. Archives of Data Science, Series A, 5(1), 1–19. [Google Scholar]
Boenig-Liptsin, M., Tanweer, A., & Edmundson, A. (2022). Data Science Ethos Lifecycle: Interplay of ethical thinking and data science practice. Journal of Statistics and Data Science Education, 30(3), 228–240. [Google Scholar] [CrossRef]
Bronfenbrenner, U., & Morris, P. A. (2007). The bioecological model of human development. In W. Damon, & R. M. Lerner (Eds.), Handbook of child psychology vol. 1: Theoretical models of human development (6th ed., pp. 793–828). Wiley. [Google Scholar]
Burr, W., Chevalier, F., Collins, C., Gibbs, A. L., Ng, R., & Wild, C. J. (2021). Computational skills by stealth in introductory data science teaching. Teaching Statistics, 43, S34–S51. [Google Scholar] [CrossRef]
Ceccucci, W., Tamarkin, D., & Jones, K. (2015). The effectiveness of data science as a means to achieve proficiency in scientific literacy. Information Systems Education Journal, 13(4), 64–70. [Google Scholar]
Chance, B., Ben-Zvi, D., Garfield, J., & Medina, E. (2007). The role of technology in improving student learning of statistics. Technology Innovations in Statistics Education, 1(1), 2. [Google Scholar] [CrossRef]
Cleveland, W. S. (2001). Data science: An action plan for expanding the technical areas of the field of statistics. International Statistical Review, 69(1), 21–26. [Google Scholar] [CrossRef]
Çetinkaya-Rundel, M., Dogucu, M., & Rummerfield, W. (2022). The 5Ws and 1H of term projects in the introductory data science classroom. Statistics Education Research Journal, 21(2), 4. [Google Scholar] [CrossRef]
Çetinkaya-Rundel, M., & Ellison, V. (2021). A fresh look at introductory data science. Journal of Statistics and Data Science Education, 29, S16–S26. [Google Scholar] [CrossRef]
Datta, S., & Nagabandi, V. (2017, November 23–24). Integrating data science and R programming at an early stage. 2017 IEEE 4th International Conference on Soft Computing & Machine Intelligence (ISCMI) (pp. 1–5), Mauritius. Available online: https://ieeexplore.ieee.org/document/8279587 (accessed on 30 January 2026).
Davidson, A. (2024). A review of the use of investigative projects in statistics and data science courses. Journal of Statistics and Data Science Education, 32(2), 188–201. [Google Scholar] [CrossRef]
De Vetten, A., Keijzer, R., & Schoonenboom, J. (2023). Pre-service primary school teachers’ knowledge during teaching informal statistical inference. Statistics Education Research Journal, 22(2), 1–16. [Google Scholar] [CrossRef]
Dichev, C., & Dicheva, D. (2017). Towards data science literacy. Procedia Computer Science, 108, 2151–2160. [Google Scholar] [CrossRef]
Dogucu, M., & Çetinkaya-Rundel, M. (2021). Web scraping in the statistics and data science curriculum: Challenges and opportunities. Journal of Statistics and Data Science Education, 29, S112–S122. [Google Scholar] [CrossRef]
Donoho, D. (2017). 50 years of data science. Journal of Computational and Graphical Statistics, 26(4), 745–766. [Google Scholar] [CrossRef]
European Commission. (2019). A Europe fit for the digital age. Available online: https://commission.europa.eu/strategy-and-policy/priorities-2019-2024/europe-fit-digital-age_en (accessed on 1 January 2026).
European Commission. (2025). Digital skills and jobs—Shaping Europe’s digital future. European Commission. Available online: https://digital-strategy.ec.europa.eu/en/policies/digital-skills (accessed on 1 January 2026).
Finzer, W. (2013). The data science education dilemma. Technology Innovations in Statistics Education, 7(2). Available online: https://escholarship.org/content/qt7gv0q9dc/qt7gv0q9dc.pdf (accessed on 8 February 2026).
Flavin, E., & Suh, J. (2024). Centering empathy in a mathematics classroom. Mathematics Teacher: Learning and Teaching PK-12, 117(5), 361–370. [Google Scholar] [CrossRef]
Franklin, C., Kader, G., Mewborn, D., Moreno, J., Peck, R., Perry, M., & Scheaffer, R. (2007). Guidelines for assessment and instruction in statistics education (GAISE) report. American Statistical Association. [Google Scholar]
Friedrich, A., Schreiter, S., Vogel, M., Becker-Genschow, S., Brünken, R., Kuhn, J., Lehmann, J., & Malone, S. (2024). What shapes statistical and data literacy research in K-12 STEM education? A systematic review of metrics and instructional strategies. International Journal of STEM Education, 11(1), 58. [Google Scholar] [CrossRef]
Friel, S. N., Curcio, F. R., & Bright, G. W. (2001). Making sense of graphs: Critical factors influencing comprehension and instructional implications. Journal for Research in mathematics Education, 32(2), 124–158. [Google Scholar] [CrossRef]
Fry, K., & Makar, K. (2021). How could we teach data science in primary school? Teaching Statistics, 43, S173–S181. [Google Scholar] [CrossRef]
Garfield, J., & Ben-Zvi, D. (2008). Developing students’ statistical reasoning: Connecting research and teaching practice. Springer Science & Business Media. [Google Scholar]
Gehrke, M., Kistler, T., Lübke, K., Markgraf, N., Krol, B., & Sauer, S. (2021). Statistics education from a data-centric perspective. Teaching Statistics, 43, S201–S215. [Google Scholar] [CrossRef]
González, O. (2025, February 4–8). Informal conceptions of appropriate choropleth maps held by inservice East and Southeast Asian teachers. Proceedings of the Fourteenth Congress of the European Society for Research in Mathematics Education (CERME14) (No. 11), Bozen-Bolzano, Italy. [Google Scholar]
Gould, R. (2017). Data literacy is statistical literacy. Statistics Education Research Journal, 16(1), 22–25. [Google Scholar] [CrossRef]
Gould, R. (2021). Toward data-scientific thinking. Teaching Statistics, 43(1), 11–22. [Google Scholar] [CrossRef]
Gould, R., Machado, S., Johnson, T. A., & Molyneux, J. (2022). Introduction to data science curriculum. Available online: https://curriculum.idsucla.org/ (accessed on 1 January 2026).
Gould, R., Machado, S., Ong, C., Johnson, T., Molyneux, J., Nolen, S., Tangmunarunkit, H., Trusela, L., & Zanontian, L. (2016). Teaching data science to secondary students: The mobilize introduction to data science curriculum. Iase-Web. Org. Available online: https://fpce.uc.pt/iase-web/documents/papers/rt2016/Gould.pdf (accessed on 30 January 2026).
Graham, A. T. (1987). Statistical investigations in the secondary school. Cambridge University Press. [Google Scholar]
Green, R. A. (2014). The Delphi technique in educational research. Sage Open, 4(2), 2158244014529773. [Google Scholar] [CrossRef]
Groth, R., & Meletiou-Mavrotheris, M. (2017). Research on statistics teachers’ cognitive and affective characteristics. In International handbook of research in statistics education (pp. 327–355). Springer International Publishing. [Google Scholar]
Hardin, J., Hoerl, R., Horton, N. J., Nolan, D., Baumer, B., Hall-Holt, O., Murrell, P., Peng, R., Roback, P., Temple Lang, D., & Ward, M. D. (2015). Data science in statistics curricula: Preparing students to “think with data”. The American Statistician, 69(4), 343–353. [Google Scholar] [CrossRef]
Higgins, T., Rubin, A., Mokros, J., Sagrans, J., & Ren-Mitchell, A. (2022, April 26–28). When the data drive the learning. 2021 International Association for Statistical Education Satellite Conference, Krakow, Poland. [Google Scholar]
Horton, N. J., & Hardin, J. S. (2021). Integrating computing in the statistics and data science curriculum: Creative structures, novel skills and habits, and ways to teach computational thinking. Journal of Statistics and Data Science Education, 29, S1–S3. [Google Scholar] [CrossRef]
IDSSP Curriculum Team. (2019). Curriculum frameworks for introductory data science. Available online: http://idssp.org/files/IDSSP_Frameworks_1.0.pdf (accessed on 1 January 2026).
Kalaian, S. A., & Kasim, R. M. (2014). A meta-analytic review of studies of the effectiveness of small-group learning methods on statistics achievement. Journal of Statistics Education, 22(1). [Google Scholar] [CrossRef]
Kauermann, G., & Seidl, T. (2018). Data science: A proposal for a curriculum. International Journal of Data Science and Analytics, 6(3), 195–199. [Google Scholar] [CrossRef]
Keller, S. A., Shipp, S. S., Schroeder, A. D., & Korkmaz, G. (2020). Doing data science: A framework and case study. Harvard Data Science Review, 2(1). [Google Scholar] [CrossRef]
Kitchin, R., & McArdle, G. (2016). What makes big data, big data? Exploring the ontological characteristics of 26 datasets. Big Data & Society, 3(1), 2053951716631130. [Google Scholar] [CrossRef]
Kjelvik, M. K., & Schultheis, E. H. (2019). Getting messy with authentic data: Exploring the potential of using data from scientific research to support student data literacy. CBE—Life Sciences Education, 18(2), es2. [Google Scholar] [CrossRef] [PubMed]
LaMar, T., & Boaler, J. (2021). The importance and emergence of K-12 data science. Phi Delta Kappan, 103(1), 49–53. [Google Scholar] [CrossRef]
Leavy, A., & Frischemeier, D. (2022). Developing the statistical problem posing and problem refining skills of prospective teachers. Statistics Education Research Journal, 21(1), 10. [Google Scholar] [CrossRef]
Leavy, A., Frischemeier, D., Kazak, S., Gonzales, O., Lamanna, L., & Gea, M. (2025). Reimagining statistics and probability education in the age of data and AI: An introduction to the work of TWG05 Leader. In Fourteenth Congress of the European Society for Research in Mathematics Education (CERME14), Free University of Bozen-Bolzano, ERME, Bozen-Bolzano, Italy, February 4–8; Report of the TWG05 working group of CERME. Available online: https://hal.science/hal-05330897 (accessed on 30 January 2026).
Lee, H., Mojica, G., Thrasher, E., & Baumgartner, P. (2022). Investigating data like a data scientist: Key practices and processes. Statistics Education Research Journal, 21(2), 3. [Google Scholar] [CrossRef]
Lee, V. R., & Delaney, V. (2022). Identifying the content, lesson structure, and data use within pre-collegiate data science curricula. Journal of Science Education and Technology, 31(1), 81–98. [Google Scholar] [CrossRef]
Li, Y., Wang, Y., Lee, Y., Chen, H., Petri, A. N., & Cha, T. (2023). Teaching data science through storytelling: Improving undergraduate data literacy. Thinking Skills and Creativity, 48, 101311. [Google Scholar] [CrossRef]
Lieben, C., & Gürtler, S. (2025, February 4–8). Bringing math into politics: The Decision Theatre Lab and the influence of model knowledge on the interpretation of stochastic simulation results. Fourteenth Congress of the European Society for Research in Mathematics Education (CERME14) (No. 16), Bolzano, Italy. [Google Scholar]
Liston, M., Morrin, A. M., Furlong, T., & Griffin, L. (2022, August). Integrating data science and the internet of things into science, technology, engineering, arts, and mathematics education through the use of new and emerging technologies. In Frontiers in education (Vol. 7, p. 757866). Frontiers Media SA. [Google Scholar]
Msweli, N. T., Mawela, T., & Twinomurinzi, H. (2023). Transdisciplinary teaching practices for data science education: A comprehensive framework for integrating disciplines. Social Sciences & Humanities Open, 8(1), 100628. [Google Scholar] [CrossRef]
National Academies of Sciences, Engineering, and Medicine. (2018). Data science for undergraduates: Opportunities and options. National Academies Press. [Google Scholar] [CrossRef]
Nolan, D., & Temple Lang, D. (2010). Computing in the statistics curricula. The American Statistician, 64(2), 97–107. [Google Scholar] [CrossRef]
Oliver, J. C., & McNeil, T. (2021). Undergraduate data science degrees emphasize computer science and statistics but fall short in ethics training and domain-specific context. PeerJ Computer Science, 7, e441. [Google Scholar] [CrossRef] [PubMed]
Overton, M., & Kleinschmit, S. (2022). Data science literacy: Toward a philosophy of accessible and adaptable data science skill development in public administration programs. Teaching Public Administration, 40(3), 354–365. [Google Scholar] [CrossRef]
Ow-Yeong, Y. K., Yeter, I. H., & Ali, F. (2023). Learning data science in elementary school mathematics: A comparative curriculum analysis. International Journal of STEM Education, 10(1), 8. [Google Scholar] [CrossRef]
Redecker, C. (2017). European framework for the digital competence of educators: DigCompEdu (Y. Punie, Ed.). EUR 28775 EN. Publications Office of the European Union. [Google Scholar] [CrossRef]
Ridgway, J. (2016). Implications of the data revolution for statistics education. International Statistical Review, 84(3), 528–549. [Google Scholar] [CrossRef]
Ridgway, J. (2022). Why engage with civic statistics? In Statistics for empowerment and social engagement: Teaching civic statistics to develop informed citizens (pp. 37–66). Springer International Publishing. [Google Scholar]
Roseth, C. J., Garfield, J. B., & Ben-Zvi, D. (2008). Collaboration in learning and teaching statistics. Journal of Statistics Education, 16(1), 1–15. [Google Scholar] [CrossRef]
Schanzer, E., Pfenning, N., Denny, F., Dooman, S., Politz, J. G., Lerner, B. S., Fisler, K., & Krishnamurthi, S. (2022, February 22). Integrated data science for secondary schools: Design and assessment of a curriculum. 53rd ACM Technical Symposium on Computer Science Education-Volume 1 (pp. 22–28), New York, NY, USA. [Google Scholar]
Schönbrodt, S., & Frank, M. (2025, February 4–8). Teaching data-driven machine learning in mathematics education. Fourteenth Congress of the European Society for Research in Mathematics Education (CERME14) (No. 19), Bolzano, Italy. [Google Scholar]
Schreiter, S., Friedrich, A., Fuhr, H., Malone, S., Brünken, R., Kuhn, J., & Vogel, M. (2024). Teaching for statistical and data literacy in K-12 STEM education: A systematic review on teacher variables, teacher education, and impacts on classroom practice. ZDM–Mathematics Education, 56(1), 31–45. [Google Scholar] [CrossRef]
Shaughnessy, J. M. (2007). Research on statistics’ reasoning and learning. In Second handbook of research on mathematics teaching and learning (pp. 957–1009). Available online: https://www.scribd.com/document/534324588/2007-Shauggenesy (accessed on 30 January 2026).
Theobold, A. S. (2020). Supporting data intensive environmental science research: Data science skills for scientific practitioners of statistics [Ph.D. thesis, Montana State University]. [Google Scholar]
Thompson, J., & Arastoopour Irgens, G. (2022). Data detectives: A data science program for middle grade learners. Journal of Statistics and Data Science Education, 30(1), 29–38. [Google Scholar] [CrossRef]
Thrasher, E., Lee, H. S., Mojica, G. F., & Graham, B. (2024). Making sense of data visualizations: A toolkit for supporting student discussions. Statistics Teacher, (Fall 2024). Available online: https://par.nsf.gov/biblio/10591343-making-sense-data-visualizations-toolkit-supporting-student-discussions (accessed on 30 January 2026).
Vance, E. A. (2021). Using team-based learning to teach data science. Journal of Statistics and Data Science Education, 29(3), 277–296. [Google Scholar] [CrossRef]
Watson, J., Fitzallen, N., Fielding-Wells, J., & Madden, S. (2017). The practice of statistics. In International handbook of research in statistics education (pp. 105–137). Springer International Publishing. [Google Scholar]
Weiland, T., & Engledowl, C. (2022). Transforming Curriculum and Building Capacity in K–12 Data Science Education. Harvard Data Science Review, 4(4). [Google Scholar] [CrossRef]
Wild, C. J., & Pfannkuch, M. (1999). Statistical thinking in empirical enquiry. International Statistical Review, 67(3), 223–248. [Google Scholar] [CrossRef]
Wilkerson, M. H., Kim, J., Lee, H. S., Stokes, D. J., & Ferrell, M. (2025). How teachers envision using data visualization discussion tasks in classroom instruction. International Journal of Science and Mathematics Education, 23(7), 2653–2687. [Google Scholar] [CrossRef]
Wu, D., Xu, H., Sun, Y., & Lv, S. (2023). What should we teach? A human-centered data science graduate curriculum model design for iField schools. Journal of the Association for Information Science and Technology, 74(6), 623–640. [Google Scholar] [CrossRef]
Xu, Z., Tang, N., Xu, C., & Cheng, X. (2021). Data science: Connotation, methods, technologies, and development. Data Science and Management, 1(1), 32–37. [Google Scholar] [CrossRef]
Yan, D., & Davis, G. E. (2019). A first course in data science. Journal of Statistics Education, 27(2), 99–109. [Google Scholar] [CrossRef]

Figure 1. PRISMA flow diagram.

Figure 2. Simplified schematic of the data cycle illustrated in the IDS curriculum (Gould et al., 2022).

Figure 3. Simplified schematic of the data processes in the IDSSP framework (IDSSP Curriculum Team, 2019).

Figure 4. Adapted outer structure of Ow-Yeong et al. (2023) framework.

Figure 5. Adapted schematics of the Data science ethos lifecycle (Boenig-Liptsin et al., 2022).

Figure 6. Adapted six-phase data science cycle from H. Lee et al. (2022).

Figure 7. The EDUCATE Data Science Conceptual Framework for Teachers—A focus on Data Science Processes and Practices.

Table 1. Search results from education databases.

Topic	Database	Number of Results
Data science education and literacy	ERIC (via EBSCO)	56
	Science Direct	427
	Web of Science	347
	Total	830
Pre-service teachers’ data literacy	ERIC (via ProQuest)	579
	Science Direct	94
	Web of Science	28
	Total	701

Table 2. Mapping of Data Science Components to Underlying Literature.

Component	Brief Description	Representative Sources	Notes on Emphasis/Variation
Problem Formulation and Question Development	Framing investigable, data-driven questions that account for complexity and ethical implications	Donoho (2017); Dichev and Dicheva (2017); H. Lee et al. (2022); Leavy and Frischemeier (2022)	Strongly emphasised across DS frameworks; overlaps with statistical inquiry cycles but expanded to include ethical and societal considerations
Data Collection, Preparation and Processing	Working with real-world, messy, structured and unstructured data; cleaning and transforming data	Donoho (2017); Kjelvik and Schultheis (2019); B. Baumer (2015); Keller et al. (2020); H. Lee et al. (2022)	Central distinguishing feature of DS compared to school statistics; degree of emphasis varies by educational level
Data Infrastructure and Big Data Technologies	Use of databases, cloud computing, and computational infrastructure to store and process data	Baig et al. (2020); Cleveland (2001); National Academies of Sciences, Engineering, and Medicine (2018); Kitchin and McArdle (2016); H. Lee et al. (2022)	Prominent in professional DS frameworks; less explicit in school-level curricula
Data Governance, Ethics and Responsible Use	Addressing bias, privacy, data ownership, consent, and ethical use of data	National Academies of Sciences, Engineering, and Medicine (2018); Dichev and Dicheva (2017); Oliver and McNeil (2021); Andersson and Register (2023)	Increasingly emphasised; not all discussion of ethics are the same and ethical framing varies from procedural to justice-oriented
Exploratory Data Analysis (EDA)	Descriptive statistics, pattern detection, inference, and relationship exploration	H. Lee et al. (2022); Wild and Pfannkuch (1999)	Strong continuity with statistics education; expanded role in DS for open-ended exploration
Data Visualisation and Visual Analytics	Interactive and dynamic visualisation for exploration, sense-making, and communication	Gould et al. (2016); Çetinkaya-Rundel and Ellison (2021); Burr et al. (2021);	Visualisation often positioned as an entry point to DS; greater interactivity than in traditional statistics
Computational Thinking, Modelling, Algorithms, Machine Learning	Algorithmic thinking, programming, predictive modelling, and machine learning	Nolan and Temple Lang (2010); Hardin et al. (2015); Donoho (2017); Dichev and Dicheva (2017); Gehrke et al. (2021); Gould (2021); Burr et al. (2021); Çetinkaya-Rundel and Ellison (2021); Horton and Hardin (2021);	Level of computational sophistication varies widely across frameworks and DS programmes
Statistical and Mathematical Foundations	Probability, inference, linear algebra underpinning DS methods	Hardin et al. (2015); National Academies of Sciences, Engineering, and Medicine (2018); Gehrke et al. (2021); Overton and Kleinschmit (2022); Gould (2021); National Academies of Sciences, Engineering, and Medicine (2018)	Foundations acknowledged but weighted differently across DS vs. statistics traditions
Domain Knowledge and Interdisciplinary Collaboration	Integration of disciplinary expertise to ensure meaningful interpretation	Ceccucci et al. (2015); Hardin et al. (2015); Theobold (2020); Xu et al. (2021); Msweli et al. (2023)	Central to professional DS practice; less explicit in school curricula
Collaboration and Communication Skills	Teamwork, stakeholder interaction, and communication of uncertainty and findings	Çetinkaya-Rundel et al. (2022); H. Lee et al. (2022); National Academies of Sciences, Engineering, and Medicine (2018)	Consistently emphasised across professional and educational frameworks

Table 3. Data sciences practices.

Data Science Practice	Description
Data preparation and processing	… involves preparing data for analysis. Depending on the structure and source of the data set, this may involve reformatting and cleaning the data to remove duplicates, filling in missing values and remove inaccuracies and inconsistencies in the data, transforming the data by converting data types or deriving new variables, integrating the data to combine data sets.
Big data technologies	… the increasing accessibility and use of big data in schools demands that teachers understand the what of big data and its’ potential to gain deeper insights into issues that are relevant to our world and enhance their data science instruction. For example, knowledge of cloud computing and associated platforms enables teachers to access big data technologies without significant upfront investment in hardware, allowing for more accessible data science projects.
Developing Statistical Questions	… access to open, large, messy and complex data sets poses challenges for the development of research questions that are sufficiently focused and well defined. It can be challenging to formulate questions that can be effectively answered given the available data and that account for the complexity of the data without oversimplifying or overcomplicating the problem. There is also the need to navigate ethical and the broader impacts of the research.
Data governance and ethics	… where students collect their own data, an understanding of informed consent, anonymization, and respecting participants’ privacy is necessary. When secondary data are sourced, it requires critical evaluation of the sources and consideration of the reliability, biases, and ethical implications of using data from different sources. Attention to bias, both their own personal and cultural bias alongside biases inbuilt into data sets is important. The use of open data requires consideration of the ethical implications of using public data. Use of data for social good, for example analysing public health data or environmental data, can develop appreciation for the projects that use data for positive social impact.
Exploratory data analysis	… involves the selection, calculation and interpretation of appropriate descriptive statistics to summarise and describe the data and correlational analysis to explore relationships within the data. Also requires the use of visualisations and statistical methods to identify patterns, trend and outliers to facilitate understanding of the underlying structure of the data and answer the original question formulated.
Data visualisation	… involves revealing and communicating insights from data through the use of data visualisation tools and techniques and also supports the communication of findings. This can be facilitated through the creation and interpretation of basic charts and graphs, pivot tables, and interactive visualisations.
Computational Thinking	… involves breaking down complex problems into manageable parts, recognising patterns, and developing algorithms to process and analyse data efficiently. Key practices include decomposition, involving dissecting problems into smaller tasks; pattern recognition, which involves identifying trends or anomalies in data; and abstraction, which focuses on filtering out irrelevant information to highlight core aspects of the data. Additionally, algorithmic thinking is essential for designing step-by-step procedures that enable data analysis, while evaluation involves assessing solutions and refining approaches. Programming languages like Python and R play a crucial role in applying these practices, providing tools and libraries that allow us to automate processes, handle large datasets, and implement complex analyses effectively.
Modelling	… is the process of creating a mathematical representation of a real-world situation using data. A model can be a mathematical equation, a statistical algorithm, or a machine learning algorithm that describes how different variables in the data relate to each other. The development of a model helps capture patterns, relationships, or trends within the data, allowing pre-service teachers to make predictions and inferences and understand underlying processes.
Collaborative and communication skills	… data science is not only a technical field but also one that thrives on effective teamwork and communication. Students equipped with these skills are better positioned to succeed in real-world settings. Consequently, it is essential for teachers to develop students’ collaborative and communication skills in order to prepare them to work with diverse teams, to clearly and concisely communicate findings from complex data sets, to document their data science process, to interact with stakeholders to define project goals and to communicate clearly, to use the language of uncertainty when there is no clear right or wrong and collaborate in making responsible decisions when addressing ethical concerns around data privacy, bias, and fairness in data science projects.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Leavy, A.; Kazak, S.; Podworny, S.; Frischemeier, D. Designing Data Science Learning in Initial Teacher Education: The EDUCATE Conceptual Framework. Educ. Sci. 2026, 16, 307. https://doi.org/10.3390/educsci16020307

AMA Style

Leavy A, Kazak S, Podworny S, Frischemeier D. Designing Data Science Learning in Initial Teacher Education: The EDUCATE Conceptual Framework. Education Sciences. 2026; 16(2):307. https://doi.org/10.3390/educsci16020307

Chicago/Turabian Style

Leavy, Aisling, Sibel Kazak, Susanne Podworny, and Daniel Frischemeier. 2026. "Designing Data Science Learning in Initial Teacher Education: The EDUCATE Conceptual Framework" Education Sciences 16, no. 2: 307. https://doi.org/10.3390/educsci16020307

APA Style

Leavy, A., Kazak, S., Podworny, S., & Frischemeier, D. (2026). Designing Data Science Learning in Initial Teacher Education: The EDUCATE Conceptual Framework. Education Sciences, 16(2), 307. https://doi.org/10.3390/educsci16020307

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Designing Data Science Learning in Initial Teacher Education: The EDUCATE Conceptual Framework

Abstract

1. Introduction

2. Importance of Incorporating Data Science in Teacher Education

3. Methods

3.1. Systematic Scoping Review

3.2. Expert-Informed Literature Identification Strategy

3.3. Structured Analytic Process

3.4. Designing the EDUCATE Framework

4. Findings

4.1. How Data Science Is Described in the Literature

4.1.1. Problem Formulation and Question Development

4.1.2. Data Collection, Preparation and Processing

4.1.3. Data Infrastructure and Big Data Technologies

4.1.4. Data Governance, Ethics and Responsible Use

4.1.5. Exploratory Data Analysis (EDA)

4.1.6. Data Visualisation and Visual Analytics

4.1.7. Computational Thinking, Modelling, Algorithms, Machine Learning

4.1.8. Statistical and Mathematical Foundations

4.1.9. Domain Knowledge and Interdisciplinary Collaboration

4.1.10. Collaboration and Communication Skills

4.2. How Data Science Is Described and Structured in School Curricula and Initial Teacher Education

4.2.1. Data Science Processes

4.2.2. Data Science Practices

Data Collection, Preparation and Processing

Data Infrastructure and Big Data Technologies

Data Governance, Ethics and Responsible Use

Data Visualisation

Exploratory Data Analysis (EDA)

Computational Thinking, Modelling, Algorithms and Machine Learning

Domain Knowledge and Interdisciplinary Collaboration

Collaboration and Communication Skills

5. The EDUCATE (Empowering Data Science Understanding for Teacher Education) Conceptual Framework

Positioning the EDUCATE Framework in the Landscape of Data Science Education

6. Beyond the Framework: Practical Considerations for Implementation

6.1. Context

6.2. Key Ideas in the K–12 Curriculum

6.3. Tools

6.4. Assessment Practices

7. Summary and Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI