1. Introduction
Data Science has emerged as a critical interdisciplinary field that connects statistics, computer science, and application sciences (
Biehler et al., 2018). It permeates all aspects of modern society, influencing domains such as healthcare, finance, education, and public policy. The rapid expansion of data availability and the increasing reliance on data-driven decision-making necessitate the development of data literacy skills among citizens. As the European Union’s “A Europe Fit for the Digital Age” campaign highlights (
European Commission, 2019), fostering data competence is essential for democratic participation, informed decision-making, and responsible citizenship.
Despite the growing relevance of data science in K-12 education, there remains a significant gap in preparing teachers to effectively integrate data science into their curricula (
Gould, 2017,
2021). With the relatively recent emergence of data science (
Donoho, 2017), there are additional challenges placed on teachers including, but not limited to, the need for context knowledge, content and pedagogical content knowledge for teaching stochastics, the lack of availability of instructional materials and the need for access to and upskilling in technological tools to support the development of students’ data science understanding (
Batanero et al., 2011;
Leavy et al., 2025;
Wilkerson et al., 2025). Without adequate support and preparation, teachers may struggle to equip students with the necessary skills to critically analyse, interpret, and use data in meaningful ways. Addressing this gap is particularly urgent given the prevalence of misinformation, fake news, and the role of data in contemporary issues such as sustainability, migration, poverty and global pandemics.
The DataSETUP project responds to these challenges by promoting data science literacy in schools through enhanced teacher education. Specifically, the project seeks to integrate data science education into pre-service teacher preparation across multiple disciplines at the university level. By doing so, it intends to build teachers’ capacity to incorporate data-driven approaches in their teaching and foster students’ competencies in working with data. The project aligns with key European initiatives, including the European Digital Skills Agenda (
European Commission, 2025) and the Digital Competence Framework for Educators (DigCompEdu) (
Redecker, 2017) framework, which emphasise the importance of digital competence for educators.
2. Importance of Incorporating Data Science in Teacher Education
Data Science is increasingly recognised as a fundamental component of 21st-century education, offering powerful tools for problem-solving, critical thinking, and decision-making. As data becomes an essential resource in all fields, equipping students with the ability to understand and analyse data is crucial for their future success (
Ridgway, 2022). Consequently, it is imperative that teacher education programmes integrate data science to ensure that pre-service teachers develop the knowledge and skills necessary to teach data literacy effectively.
For pre-service teachers, particularly those specialising in STEM disciplines, a strong foundation in data science is indispensable. Research suggests that such a foundation enables them to develop a deep understanding of data-driven reasoning, statistical thinking, and computational analysis (
Ridgway, 2016,
2022) by incorporating the skills necessary to integrate data analysis, simulations, and visualisation tools into their teaching practices (
Chance et al., 2007;
Finzer, 2013;
Wilkerson et al., 2025). It supports these future teachers to engage students in authentic, inquiry-based learning experiences that reflect real-world data applications (
Gould, 2017;
Weiland & Engledowl, 2022) and foster students’ ability to critically assess data sources, interpret findings, and make informed decisions (
Friedrich et al., 2024;
Schreiter et al., 2024).
Failure to integrate data science education into teacher preparation risks leaving educators unprepared to navigate the evolving landscape. Moreover, it may result in students lacking essential competencies for participation in a data-driven society. By embedding data science into teacher education, we can ensure that future educators are equipped with the necessary content knowledge, pedagogical strategies, digital tools, and conceptual understandings to teach data science effectively.
Despite growing recognition of the importance of data science in education, there is no coherent conceptual framework to guide how data science should be taught in initial teacher education. Existing frameworks and curricula largely target K-12 learners or focus on disciplinary perspectives such as statistics or computer science, leaving teacher educators without clear guidance on the processes and practices that pre-service teachers need to engage in to become competent teachers of data science. This paper addresses this critical gap by presenting a conceptual framework specifically designed for teacher education—one that articulates the processes, practices, and competencies required for pre-service teachers to learn how to do data science and, in turn, support their future students. The framework developed through the DataSETUP project thus offers a much-needed foundation for structuring teacher education programmes, developing professional learning materials, and ensuring that pre-service teachers are prepared to foster meaningful data literacy in a data-intensive world.
3. Methods
Development of the conceptual framework was informed by a research design consisting of three integrated components. The first two components, a systematic scoping review and expert-elicitation search, generated a body of readings. The third component, a structured analytic process, identified the key learnings from the readings to inform the development of the conceptual framework.
3.1. Systematic Scoping Review
The primary goal of this systematic literature review was to inform the development of the conceptual framework for data science in teacher education. It focused on research on data science as it pertains to data science literacy in education, existing data science curricula and frameworks, and data science in teacher education. Two separate but complementary searches were conducted:
- ▪
Data science education and literacy;
- ▪
Pre-service teachers’ data literacy.
Using a comprehensive set of search terms and filters, searches were undertaken across the following academic databases: ERIC (via EBSCO and ProQuest), Science Direct, and Web of Science. The searches were limited to English-language publications from 2013 to 2023. A variety of search phrase combinations were applied across these databases to ensure broad coverage (e.g., “data literacy” AND “teacher education”; “data science” AND “education”). This resulted in an initial pool of 1,531 articles. The search results are presented in
Table 1.
Duplicate records (n = 327) were removed using Zotero (version 6.0) reference management software, resulting in 1204 unique records for screening. Title and abstract screening was guided by predefined inclusion/exclusion criteria. Screening was conducted by three reviewers, who began by coding a shared sample of 30 articles to ensure alignment. Thereafter, the reviewers independently screened the remaining articles. Screening inclusion criteria differed slightly for the two search strands:
For “data science education/literacy,” studies were included if they: (1) reported on data science education or literacy; (2) involved pre-service or in-service teachers; (3) ideally referenced a framework, model, or curriculum; and (4) were situated in an educational context.
For “pre-service teacher data literacy,” studies were included if they: (1) reported on data literacy; (2) involved teacher education students; (3) ideally referenced a framework, model, or curriculum; and (4) were situated in an educational context.
This screening yielded 134 papers for full-text review. A further 21 were excluded following full-text reading due to:
Focus on historical policy accounts;
Emphasis on bespoke data management or software evaluation;
Narrative opinion pieces without empirical data;
An exclusive focus on computational thinking or programming without reference to data science.
All full-text reviews were completed by research teams working in pairs or small groups.
3.2. Expert-Informed Literature Identification Strategy
To complement the database-driven search, an expert-informed literature identification strategy was employed. This drew on the practice commonly used in Delphi studies involving systematically consulting recognised experts to identify influential literature in emerging. Expert input is particularly valuable in rapidly developing areas such as data science where research may be dispersed across disciplines and not yet fully indexed in academic databases and is particularly useful when developing curricula and learning experiences (
Green, 2014). To generate the expert pool, members of the research team nominated individuals with established expertise in data science or statistics education. A total of 22 experts were contacted via email and invited to “share with us what you perceive as the three most interesting and relevant papers about data science literacy or data literacy in teacher education (only for teachers or pre-service teachers), including any papers you may have authored yourself.” Responses and associated papers were collated. Further hand searches were carried out in selected journals of relevance, including
ZDM,
Statistics Education Research Journal (SERJ), and
The Journal of Statistics and Data Science Education.
These combined efforts from the expert-elicitation search and the hand search of journals yielded an additional 88 studies. Of these, 27 were duplicates (already identified in the initial pool), and 39 were excluded based on the same inclusion/exclusion criteria described earlier. The remaining 22 were included for full analysis, bringing the total number of studies analysed to 135.
Figure 1 illustrates the PRISMA flow diagram outlining the identification, screening, and inclusion process for studies sourced through the systematic literature review (outlined in
Section 3.1), as well as those added via hand-searching and expert recommendations (as described here in
Section 3.2).
3.3. Structured Analytic Process
A set of guiding questions were developed to guide researchers in extracting critical insights from the literature to inform the conceptual framework. The questions were designed to foreground disciplinary definitions, required prior knowledge, core data science concepts, and curricular approaches. They were intentionally broad to accommodate variation across the literature, yet specific enough to generate comparable insights:
- ▪
Question 1: How is data science described and defined?
- ▪
Question 2: What is the most important prior knowledge required for pre-service teachers?
- ▪
Question 3: What is the important data science concepts that preservice teachers need to know?
- ▪
Question 4: How are approaches to teaching data science in curricula (school-level, undergraduate and initial teacher education) structured?
Guided by these questions, researchers read, summarised, and discussed the 222 extracted papers. Although all 222 papers were read, 87 studies were excluded at various stages during this process, leaving 135 for final synthesis. Each researcher team produced a written summary of findings, which were then integrated, compared, and refined collaboratively. Through iterative discussion, areas of convergence across the analyses were identified, leading to the development of two overarching themes that structure the presentation of findings in the following section. Approximately 60 papers of the 135 paper analysed made a unique and direct contribution to one or both themes and are cited in this paper. The remaining 75 studies offered supportive or contextual insights but did not contribute novel analytical findings.
The four guiding questions shaped the analytic process and are used to synthesise findings across the reviewed studies and frameworks. Questions 1 and 2 which concern how data science is described and what prior knowledge pre-service teachers require were conceptually aligned and therefore informed the first major analytic theme: How Data Science is Described in the Literature. Questions 3 and 4 which focus on identifying essential data science concepts for teachers and examining how data science is structured in school, undergraduate, and teacher education curricula, naturally converged into the second analytic theme: How Data Science is Described and Structured in School Curricula and Initial Teacher Education. This amalgamation of four lines of inquiry into two higher-order themes reflects the way the literature clusters conceptually, allowing for a more integrated and coherent synthesis of findings.
3.4. Designing the EDUCATE Framework
The development of the EDUCATE framework followed a multi-phase, iterative process grounded in the findings of the systematic literature review. After identifying ten core components of data science practice (described in
Section 4.1), the research team engaged in several rounds of categorisation, drawing on established frameworks in both data science (e.g.,
H. Lee et al., 2022; InSTEP [
https://instepwithdata.org] accessed on 1 January 2026) and statistical inquiry. We examined how these components could be meaningfully grouped, seeking to identify recurring sequences or cycles of activity that define data science engagement.
This process led to the formulation of four overarching processes (Problem Formulation, Preparation, Analysis, and Interpretation & Communication) that represent the structural phases of a data science investigation. The remaining components were conceptualised as nine cross-cutting practices (e.g., computational thinking, ethical reasoning, and data visualisation) which support and enrich each stage of the process. This distinction was strongly informed by
H. Lee et al. (
2022), whose framework provided a clear model for differentiating between structural processes and embedded skills or habits of mind.
Early drafts of the model explored cyclical and nested structures, linking specific practices to particular processes. However, the team ultimately chose to represent the practices as flexible and non-linear. This structure preserves conceptual clarity while allowing teacher educators and pre-service teachers to adapt the framework to local contexts. Importantly, the EDUCATE framework is tailored to the needs of non-specialist K–12 educators. While it draws on professional data science models, we intentionally filtered out components that demand high levels of technical expertise or infrastructure (e.g., data infrastructure, web scraping). Instead, we retained elements that are both pedagogically meaningful and developmentally appropriate (for example, data ethics, visualisation, and basic statistical reasoning) thereby aligning with existing school curricula and teacher capabilities.
In addition to the core model, we identified four pedagogical considerations that influence how data science is taught in classrooms. These considerations were informed by literature and curriculum analysis and also reflect our acknowledgement of the broader ecological system in which teaching occurs (cf.
Bronfenbrenner & Morris, 2007). While not conceptual elements of data science itself, they serve as practical supports for implementation, helping educators plan for real-world constraints and opportunities in their specific settings. For this reason, they are presented separately in the “Beyond the Framework”
Section 6.
4. Findings
The first analytic theme, How Data Science is Described in the Literature, describes ten core components that recur across a wide range of data science frameworks and publications. This section is intentionally descriptive and conceptual in nature, aiming to provide readers with a coherent sense of the processes, practices, and forms of work involved in “doing” data science. As these components are consistently represented across the literature, they are presented here as an integrative synthesis rather than through attribution to individual sources. However, to enhance transparency and support traceability,
Table 2 maps each component to representative sources from the literature reviewed, highlighting areas of convergence and variation across emphases.
In contrast, the second theme, How Data Science is Described and Structured in School Curricula and Initial Teacher Education, draws explicitly on empirical studies, curricular frameworks, and policy-oriented publications. Accordingly, this section is more heavily referenced, as it reports how data science is interpreted, prioritised, and enacted within specific educational contexts. The distinction between the two sections reflects a deliberate analytic move: the first establishes a conceptual foundation for data science practice, while the second examines how this foundation is translated into educational structures for learners and pre-service teachers.
4.1. How Data Science Is Described in the Literature
Data science, often described as the “Science of learning from data” (
Donoho, 2017, p. 748), is an interdisciplinary field that draws on mathematics, statistics, computer science, and domain-specific knowledge to generate insights from data. While the field remains variably defined (
Gould, 2021), conceptual frameworks of data science consistently attempt to articulate the processes and practices involved in doing data science, from problem formulation through to analysis, interpretation and communication. Following a comprehensive review of data science frameworks and the related literature describing professional data science practice, ten core components were identified that recur across these descriptions. These components are presented as a synthesis of the work and practices involved in data science, rather than as a prescriptive or exhaustive taxonomy. As such, they provide a conceptual foundation for understanding what it means to engage in data science practice and form the basis for examining how these practices are taken up, adapted, and structured within educational contexts.
4.1.1. Problem Formulation and Question Development
Access to open, large, messy and complex data sets poses challenges for the development of research questions that are sufficiently focused and well defined. Developing this skill ensures that data science projects focus on solving relevant problems that can be effectively answered given the available data, account for the complexity of the data without oversimplifying or overcomplicating the problem, extract actionable insights and take consideration of the need to navigate the ethical and broader impacts of the research.
4.1.2. Data Collection, Preparation and Processing
The shift toward use of real-world messy data necessitates skills in data processing. Consideration in the literature is given to sources of data such as surveys, databases, web scraping, sensors, pictures, text, GPS, maps and audio as data. Attention is also given to the format of data with structured (e.g., databases) and unstructured formats (e.g., video, text, images) mentioned. Frameworks also attend to data collection methods such as experiments, surveys and observational studies to collect data. The cleaning of data to ensure quality is emphasised through references to handling missing values, removing duplicates, correcting errors and considering outliers. Equally receiving consideration is the need to transform data through normalising or scaling data and integration of data from different sources.
4.1.3. Data Infrastructure and Big Data Technologies
These technologies are essential to data science because they enable the storage of data using databases and data warehouses, the processing of data, and cloud computing which offers cost-effective resources to handle computationally intensive tasks.
4.1.4. Data Governance, Ethics and Responsible Use
This requires critical evaluation of data sources and consideration of the reliability, biases, and ethical implications of using data from different sources. Attention is needed to ethical consideration to counteract bias, both personal and cultural bias alongside biases inbuilt into data sets. Attention to data governance policies relating to rules for data access, data sharing and data retention is critical, including data management and use of best practices for data storage and management. The use of open data requires consideration of the ethical implications of using public data and the use of data for social good can develop appreciation for the projects that use data for positive social impact. Where students collect their own data, they need to develop understanding of informed consent, anonymization, and respecting participants’ privacy.
4.1.5. Exploratory Data Analysis (EDA)
Broadly, exploratory data analysis approaches focus on the selection, calculation and interpretation of appropriate descriptive statistics (e.g., measures of centre, variability and correlation) to summarise and describe data, correlational analysis to explore relationships within the data, hypothesis testing, inference and pattern detection. Many of these exploratory data analytic techniques have their roots in statistical analysis and were originally developed within the discipline of statistics.
4.1.6. Data Visualisation and Visual Analytics
Data visualisation is a critical component of data science, enabling the representation of complex datasets through graphical means to facilitate understanding, pattern recognition, and decision-making. Unlike traditional statistical inquiry, where visualisation is primarily used to confirm hypotheses and summarise results (e.g., histograms, box plots, scatterplots), data science leverages visualisation as an interactive and dynamic tool for exploration. In data science, visualisation is frequently employed to identify patterns, anomalies, and relationships in large and unstructured datasets, often integrating real-time or multidimensional data representations that go beyond conventional statistical graphs. Data visualisation is closely connected to advancements in technology, enabling the use of interactive tools, real-time dashboards, and AI-driven analytics to enhance data exploration. It also leverages simulation and dynamic graphs, allowing users to model complex scenarios, test assumptions, and engage with data in ways that static graphs cannot. These technological integrations make visualisation a powerful medium not just for analysis, but also for communication and decision-making in data science.
4.1.7. Computational Thinking, Modelling, Algorithms, Machine Learning
Computational thinking involves breaking down complex problems into manageable parts, recognising patterns, and developing algorithms to process and analyse data efficiently. Key practices include decomposition, where data scientists dissect problems into smaller tasks; pattern recognition, which involves identifying trends or anomalies in data; and abstraction, which focuses on filtering out irrelevant information to highlight core aspects of the data. Additionally, algorithmic thinking is essential for designing step-by-step procedures that enable data analysis. Programming languages like Python and R play a crucial role in applying these practices, providing tools and libraries that allow data scientists to automate processes, handle large datasets, and implement complex analyses effectively. Machine learning comes into play here as it supports the development and application of algorithms that allows computers make automated predictions from the data. Modelling is the process of creating a mathematical representation of a real-world situation using data. A model can be a mathematical equation, a statistical algorithm, or a machine learning algorithm that describes how different variables in the data relate to each other.
4.1.8. Statistical and Mathematical Foundations
Many definitions and frameworks retain a strong focus on probability theory and statistical inference alongside a foundation in linear algebra and calculus (important for algorithmic understanding and machine learning). For example, Bayesian inference, a fundamental concept in probability, is widely used in machine learning for updating beliefs in light of new data. Similarly, probabilistic models rely on probability theory to identify patterns, make predictions, and handle uncertainty in data-driven applications.
4.1.9. Domain Knowledge and Interdisciplinary Collaboration
The contextual relevance of data science is emphasised through the focus on collaborating with domain experts to ensure accuracy of data insights. Collaboration within the context of working in multidisciplinary and cross-functional teams is essential in data science, as real-world data problems require expertise from multiple fields to ensure robust analysis, ethical considerations, and meaningful interpretation. Disciplines such as computer science, mathematics, and statistics contribute technical foundations, while domain-specific fields like healthcare, finance, education, and environmental science provide context and application, and ethics, social sciences, and philosophy help address issues of bias, fairness, and societal impact.
4.1.10. Collaboration and Communication Skills
The importance of effective teamwork and communication is evident across many frameworks which identify the need to develop students’ collaborative and communication skills in order to prepare them to work with diverse teams, to clearly and concisely communicate findings from complex data sets, to document their data science process, to interact with stakeholders to define project goals and to communicate clearly, to use the language of uncertainty when there is no clear right or wrong and collaborate in making responsible decisions when addressing ethical concerns around data privacy, bias, and fairness in data science projects. This incorporates attention to data communication through the use of comprehensible report writing and storytelling with data.
4.2. How Data Science Is Described and Structured in School Curricula and Initial Teacher Education
The components outline in the previous section describe the core processes and practices involved in doing data science, as they are represented across professional and conceptual accounts of the field. However, the ways in which these components are interpreted, prioritised, and structured within educational contexts, particularly at school level and in initial teacher education, are neither uniform nor comprehensive. The following section therefore examines how these core components of data science are taken up within curricula and frameworks designed for learners and pre-service teachers. Drawing on empirical studies, curricular frameworks, and programme descriptions, this analysis explores how data science processes and practices are translated into educational structures, highlighting areas of alignment, emphasis, and omission in school curricula and teacher education.
Our review of data science conceptual frameworks and the research literature reporting on data science courses for school level students, teachers and other undergraduate students, points towards components considered necessary for non-data science majors. There are a small number of data science frameworks specifically designed for educators, including schoolteachers and higher education instructors. These frameworks aim to integrate data science principles into the classroom, helping teachers guide students in understanding data, developing questions, utilising computational thinking skills, and applying data-driven decision-making, amongst other emphases. A feature common to many of these frameworks is a combined emphasis on data science processes and data science practices. These broadly align with the components of data science described in the previous section.
The reviewed frameworks vary not only in structure but also in the grade levels they target, and this shapes the emphasis placed on different components of data science. For example, the IDS and IDSSP frameworks, developed for high school learners, engage with a broad range of data science processes, including computational modelling, algorithmic thinking, and big data technologies. In contrast, the
Ow-Yeong et al. (
2023) framework, designed for elementary mathematics education, places greater emphasis on problem formulation, real-world data contexts, exploratory data analysis, and visualisation, as well as ethical awareness and data storytelling. The activities described in their framework are closely tied to curriculum-aligned content (e.g., bar graphs, pictographs) but framed through the lens of open, authentic inquiry using real data. Notably, components such as machine learning, infrastructure, or advanced modelling are absent, reflecting age-appropriate pedagogical choices. This distinction reinforces the importance of aligning data science education with learners’ cognitive and developmental readiness and illustrates how the components in Theme 1 may be differently prioritised across educational levels.
4.2.1. Data Science Processes
Data science processes are “the steps taken in order to achieve full understanding of data science procedures, including formulating questions, collecting data, analyzing data (visuals and models), and communicating results” (
Thompson & Arastoopour Irgens, 2022, p. 30). This emphasis is visible in many of the frameworks that represent the data science process as a structured, systematic sequence of steps or stages to follow in order to solve a data-driven problem, i.e., a workflow that guides the entire lifecycle of a data science project. In the field of statistics education, there have been several frameworks consisting of four-phase (
Franklin et al., 2007;
Graham, 1987) and five-phase statistical inquiry cycles (
Watson et al., 2017;
Wild & Pfannkuch, 1999) representing approaches to solving problems with data. Examination of data science frameworks, described below, suggests broad alignment with these statistical inquiry cycles.
The
Introduction to Data Science (IDS) curriculum (
Gould et al., 2022), designed for high school students, covers key data science concepts such as data collection, cleaning, visualisation, analysis and real-world data projects that engage students in hands-on learning. A representation of a four-phase data cycle is used in the IDS curriculum to illustrate some data science processes (see
Figure 2). The cycle of learning from data is also placed at the centre of data science (
Figure 3) in the
International Data Science in Schools Project (IDSSP) (
IDSSP Curriculum Team, 2019) an initiative aimed at developing a comprehensive framework and curriculum for teaching data science in schools worldwide. In contrast, the outer circle of
Ow-Yeong et al.’s (
2023) depiction of data science describes the data science process (see
Figure 4); these broadly align with the IDS and IDSSP data cycles. However, this framework incorporates specific reference to data science practices and locates them within mathematics and statistics (e.g., probability theory), computer science (e.g., data processing, programming, algorithms) and domain applications (e.g., sciences, public policy). The
Data Science Ethos Lifecycle (
Boenig-Liptsin et al., 2022) presents a six-stage data science workflow that incorporates a focus on exploratory data analysis and the use of analytical tools such as modelling (
Figure 5). Similarly, the six phase data investigation process proposed by
H. Lee et al. (
2022) (
Figure 6) incorporates the consideration of models. The authors emphasise that the distinguishing feature of models within the context of data science is that data scientists choose “specific models as evidence to support claims that address an investigative question, often discarding models that do not help answer a question” (p. 13). The authors also make efforts to communicate the, at times, nonlinear and dynamic nature of the work that can occur simultaneously within and between phases.
These frameworks emphasise the data science process, which consists of steps closely aligned with well-established statistical inquiry cycles. In addition, aligned with these steps they incorporate several key data science components outlined in the previous section (
Section 4), notably
Data Preparation and Processing (Collect Data),
Exploratory Data Analysis and
Data Visualisation (Analyse Data), and
Communication Skills (Communicate Conclusions). These components are identified also in research examining and critiquing data science education for pre-collegiate (
Adisa et al., 2024;
LaMar & Boaler, 2021;
V. R. Lee & Delaney, 2022) and undergraduate (non-data science majors) students (
B. Baumer, 2015;
Li et al., 2023;
Yan & Davis, 2019). Arising from her review of over 250 peer-reviewed articles, book chapters, and International Association for Statistical Education (IASE) conference proceedings,
Davidson (
2024) emphasises that it is the use of investigative projects in statistics and data science courses that allows students to experience the entire cycle of a statistical or data science investigation process.
A key feature shared by data science processes and traditional statistical inquiry cycles, particularly emphasised in frameworks and studies related to teachers, is the importance of developing statistical questions and identifying problems (e.g., the IDS curriculum, the Data Science Ethos Lifecycle, and the IDSSP framework). These frameworks, along with several research studies, highlight the critical role of asking meaningful and relevant questions in educational contexts (
LaMar & Boaler, 2021;
Leavy & Frischemeier, 2022;
V. R. Lee & Delaney, 2022;
Yan & Davis, 2019). For example,
Dichev and Dicheva (
2017) identified the ability to formulate productive questions as an essential skill in their recommended set of competencies for a general education data science course for non-technical students.
4.2.2. Data Science Practices
Compared to data science processes, data science practices are more focused on the day-to-day activities, tools and ethical considerations within each step of the data science process. The practices focus on how tasks are carried out on a granular level, which techniques and tools are employed, and why certain methodologies may be preferred. This relationship between steps in the data science process and the associated data science practices that support those steps is highlighted in the original IDSSP framework, in
H. Lee et al.’s (
2022) data investigation process and in
Ow-Yeong et al.’s (
2023) framework. Because the schematics presented in this paper focus on overarching processes rather than detailed practices, we encourage readers to consult the original articles and visual frameworks for a fuller representation of the specific data practices aligned with each phase of the data science cycle.
Data Collection, Preparation and Processing
Data science, by its nature, deals with larger and more complex data sets than those traditionally used in school contexts. These data are often characterised by the multiple Vs (see
Kitchin & McArdle, 2016) of volume (enormous amounts), velocity (rapid and real time creation of data), variety (heterogenous nature of data), value (usefulness of data due to the multiple insights that can be gained) and veracity (truthfulness, accuracy and precision), though additional characteristics are also noted. The
preparation and processing of such data to make it suitable for analysis provides considerable challenges within the classroom context (
Kjelvik & Schultheis, 2019). However, attention to the proper preparation of data has been provided in data science courses for high school students (
LaMar & Boaler, 2021), undergraduate liberal arts students (
B. Baumer, 2015) and for undergraduate data analytics and statistics students that explore web scraping techniques (
Dogucu & Çetinkaya-Rundel, 2021). Attention to data preparation and processing is also incorporated into several frameworks, including the ‘processing phase’ of
H. Lee et al. (
2022), which encompasses data organising, structuring, cleaning, and transforming. Similarly, the IDSSP framework highlights this aspect in its ‘getting data’ phase, which involves data harvesting and wrangling/munging. The centrality of this data science practice is also evident in the data science framework developed by
Keller et al. (
2020) which position data wrangling (data profiling, preparation and linkage) in the centre of its framework.
Data Infrastructure and Big Data Technologies
As teachers increasingly engage their students in data science and engage in handling large volumes of educational data themselves, it is important for them to grasp the basics of
big data technologies. Indeed, the use of “big data” is becoming more prevalent as evident in a study by
Liston et al. (
2022) who responding to a brief from elementary students, used an IoT environmental monitoring system to collect, analyse and visualise data on the light, humidity, sound and temperature in the school environment, and
Higgins et al. (
2022) who engaged 11–14-year-olds in analysis of open source data from the US Centers for Disease Control and Prevention (CDC). Another example is the Mobilize Introduction to Data Science (IDS) curriculum (
Gould et al., 2016) which uses an open-source system that supports and manages the flow of publicly available data from mobile devices to the classroom whereupon the data are analysed by students in an effort to understand their community. Studies have also engaged pre-service teachers in analysing big data through analysis of heat maps sourced from publicly available data showing crime statistics in two major cities (
Andersson & Register, 2023).
Data Governance, Ethics and Responsible Use
Ethical considerations “lie at the heart of data science” (
National Academies of Sciences, Engineering, and Medicine, 2018, p. 30). Across the literature there is growing awareness of the need for teachers and non-DS majors to attend to
data governance and ethics in data science because this will equip students with the knowledge and skills to use data responsibly, ethically, and effectively. Furthermore, it helps students understand the broader impact of their work, ensures they handle data with care, promotes fairness and equity, and prepares them for the ethical challenges they will encounter in their professional lives. For teachers, incorporating these topics into the curriculum is vital for developing not just skilled data scientists, but also conscientious and socially responsible professionals. The importance of attending to ethics is emphasised in the literature for undergraduate students who are non-DS majors (
Dichev & Dicheva, 2017), alongside teachers of primary (
Fry & Makar, 2021), secondary (
V. R. Lee & Delaney, 2022) and college (
B. S. Baumer et al., 2022;
Li et al., 2023) students. The issue of ethics transcends all aspects of data science. For example, the increasing use of machine learning models in education has led to concerns about the reliance on recommendations from such models (
Bach et al., 2022) and resulting in efforts to establish the ways in which college students justify the usability of self-built models (
Bata et al., 2025).
Awareness of personal bias and how they might impact the questions posed, variables selected, and communication of conclusions has been identified by
Wild and Pfannkuch (
1999). In relation to liberal arts students,
B. S. Baumer et al. (
2022) identify important DS ideas in relation to the ethics of data science as: (1) Ethical precepts for data science and codes of conduct, (2) Privacy and confidentiality, (3) Responsible conduct of research, (4) Ability to identify “junk” science, and (5) Ability to detect algorithmic bias. However, there appears to be consensus that data governance and ethics does not receive sufficient attention in college data science programmes and requires continued and focused attention (
Oliver & McNeil, 2021). An effort to examine pre-service mathematics teachers’ ethical reasoning in big data carried out by
Andersson and Register (
2023) indicated that pre-service teachers presented a diverse range of ethical arguments related to data access, which supported their efforts to critically examine oppressive situations. However, their reasoning may be constrained by a limited understanding of data science methodologies, suggesting a need for greater emphasis on these concepts in mathematics teacher education. Evidence to support the role played by data science methodologies, in this case understanding of specific model knowledge, is provided by
Lieben and Gürtler (
2025) who found that such knowledge increased the accuracy of German 10th grade students’ interpretations of the stochastic variations in simulations when examining scientific epidemic models. However, being a data science major is not sufficient to ensure proper grounding in ethics as identified by
Oliver and McNeil (
2021) who reviewed 18 undergraduate data science programmes and found they lack sufficient focus on the ethics of data use and misuse.
Data Visualisation
Research on data science education at the school level (K-12) and in teacher education highlights the essential role of data visualisation in fostering data literacy and analytical skills. Several school-level curricula emphasise visualisation as a means of engaging students with real-world data, supporting inquiry-based learning, and enhancing computational thinking (
Weiland & Engledowl, 2022). The IDS curriculum (International Data Science in Schools Project) integrates visualisation as a core component, encouraging students to explore and interpret data dynamically rather than relying solely on numerical summaries. Similarly, the Bootstrap: Data Science curriculum (
Schanzer et al., 2022) incorporates visualisation as a foundational tool for exploring relationships in datasets and making sense of complex information. Growth in use of “data talks” and data discussion tasks about socially relevant data visualisations (e.g.,
LaMar & Boaler, 2021;
Flavin & Suh, 2024;
Wilkerson et al., 2025), alongside use of frameworks to support discussion about data displays (
Friel et al., 2001;
Thrasher et al., 2024), and data visualisations focusing on important civic issues (e.g., Gapminder,
https://www.gapminder.org (accessed 1 January 2026)) have drawn greater attention to the use of visualisation in school statistics. In teacher education, studies have shown that pre-service teachers often lack confidence in their ability to teach data visualisation (
Groth & Meletiou-Mavrotheris, 2017) and experience challenges interpreting non-traditional representations (
Gonzales, 2025), despite recognising its importance in developing students’ data reasoning skills. However, research also suggests that when pre-service teachers engage in hands-on experiences with visualisation tools and interactive data exploration, their understanding of data science concepts improves, and they become better equipped to integrate visualisation into their teaching practices (
Wilkerson et al., 2025). The importance of starting an introductory data science course with visualisation is emphasised by
Çetinkaya-Rundel and Ellison (
2021), as it leverages students’ intuitive understanding and allows for a gradual transition to more complex statistical concepts. Additionally, visualisation provides immediate feedback, making errors easier to detect compared to tasks like data wrangling or modelling. These findings underscore the need for explicit instruction in data visualisation within both K-12 curricula and teacher education programmes to ensure that future educators can effectively incorporate visual data representations in their classrooms.
Exploratory Data Analysis (EDA)
Research over the past several decades has extensively explored ways to support pre-service teachers in developing statistical reasoning, recognising that many of these skills are foundational to data science, resulting in the development of a strong literature base on teaching and learning of data and statistics (
Weiland & Engledowl, 2022). Studies indicate that while pre-service teachers often have familiarity with basic descriptive statistics, they may struggle with deeper conceptual understandings of variability, correlation, and inference, key components of Exploratory Data Analysis (EDA) (
Garfield & Ben-Zvi, 2008;
Shaughnessy, 2007). Efforts to enhance statistical reasoning have included integrating dynamic data visualisation tools, simulations, and real-world data sets into teacher education programmes to promote a more intuitive and inquiry-driven approach to data analysis (
Groth & Meletiou-Mavrotheris, 2017;
De Vetten et al., 2023). Importantly, the ability to interpret trends, recognise patterns and make data-driven inferences align closely with the skills required in data science. As such, strengthening pre-service teachers’ EDA competencies not only enhances their statistical literacy but also prepares them to engage meaningfully with data-intensive problems across educational and professional contexts.
Computational Thinking, Modelling, Algorithms and Machine Learning
The importance of Computational Thinking, Modelling, Algorithms and Machine Learning is evident in the recent literature. The role of programming skills in statistics and data science practice is being increasingly acknowledged (
Nolan & Temple Lang, 2010); however,
Horton and Hardin (
2021, p. 51) refer to the “notable gap … between our intentions and our actions”. In this Special Issue dedicated to integrating computing in the statistics and data science curricula,
Horton and Hardin (
2021) identified three non-mutually exclusive approaches that might be fruitful in this regard: creative restructuring of curricula, the incorporation of novel or technical data science skills into statistics courses (for example, web scraping) and implementing computational thinking skills into courses. Indeed, this ability to think computationally was identified as a key skill by
Dichev and Dicheva (
2017) in their design of a general education course on data science for non-technical students. The degree to which high level computational or programming skills are required for non-data science majors, however, remains unclear with
Overton and Kleinschmit (
2022, p. 362), in their description of a Data Science Literacy Framework to incorporate data science principles into public administration programmes, referring to the ‘faulty assumption that data science tasks require statistical and computational sophistication’.
Programming receives mention in several frameworks and curricula for school-level data science. In
Ow-Yeong et al. (
2023) data science framework, programming is located within the domain of computer science. There is reference to programming in the
bootstrap:Data Science curriculum (
Schanzer et al., 2022) in the ingredient called ‘Computing’ which also incorporates attention to data acquisition, management and cleaning. Similarly,
Weiland and Engledowl (
2022) argue for the need to teach programming in the K-12 curricula that is relevant for data wrangling and analysis. Some school curricula at the early years of school education lay the foundations for programming, through the use of visual languages and block-based coding platforms (
Datta & Nagabandi, 2017). Focusing on the later elementary years, a study by
Thompson and Arastoopour Irgens (
2022) with 11–13-year-olds used a combination of nonprogramming activities (e.g., use of google trends to explore and visualise data from public Google searches) and programming activities (using the language R). A study by
Schönbrodt and Franke (
2025) explored how the foundational principles of machine learning can be mapped onto mathematics curricula. Focusing on classification problems and Support Vector Machines (SVM), they examined how key mathematical concepts, such as distances and the dot product, can be introduced through structured learning trajectories in secondary education. However, despite the increasing recognition of computational thinking, modelling, algorithms, and machine learning in data science education, many educators and learners still face challenges in acquiring the necessary competencies. The gap between the intended integration of programming skills and their actual implementation in statistics and data science education, as noted by
Horton and Hardin (
2021), underscores this need. Similarly,
Msweli et al. (
2023), in a scoping review of data science education, identified a lack of competencies in working with data platforms, models, and tools as a key challenge in data science education. Addressing these gaps requires continued efforts to embed computational thinking and programming into curricula in ways that are accessible and relevant for both pre-service teachers and students at various educational levels.
Domain Knowledge and Interdisciplinary Collaboration
The need to emphasise domain knowledge and interdisciplinary collaboration is a key component of data science. The need for domain specific education was emphasised by
Oliver and McNeil (
2021) in their evaluation of undergraduate data science programmes at a subset of 4-year institutions in the United States. The authors acknowledged the degree to which institutions were addressing communication in data science courses. The importance of developing teamwork and collaboration skills in data science courses was outlined by
Vance (
2021) who recommended a pedagogical strategy called Team Based Learning that enhance these skills and also acknowledged by
Wu et al. (
2023, p. 626) stating that “data science requires a multidisciplinary approach. Not only does it need to be closely tied to data-driven technologies, but it also needs to represent the contributions of different disciplines in data science objectively.”
Collaboration and Communication Skills
The data science literature and associated frameworks emphasise the critical role that communication and collaboration skills play in the life of a data scientist (
Kauermann & Seidl, 2018;
National Academies of Sciences, Engineering, and Medicine, 2018;
H. Lee et al., 2022) with
Roseth et al. (
2008, p. 1) aptly stating that “Collaboration is not just an end goal of statistics instruction but also a means to help students learn statistics.” Consequently, approaches such as small group cooperative learning approaches (
Kalaian & Kasim, 2014;
Roseth et al., 2008) and team-based learning have been adopted within statistics and data science courses (
Vance, 2021). The importance of communicating results and making evidence-based claims is embedded in all data science frameworks (e.g., IDSSP framework,
H. Lee et al., 2022).
In conclusion, the analysis presented in this theme demonstrates that while many data science curricula and frameworks emphasise common processes and practices, there is considerable variation in how these are structured, prioritised, and pedagogically enacted. This is particularly the case for learners who are not data science majors. Across school curricula and initial teacher education, core components of data science are often introduced in fragmented ways, with limited guidance on how pre-service teachers might coherently experience and integrate the processes and practices involved in doing data science. These findings highlight the need for a conceptual framework that is explicitly designed for teacher education: one that foregrounds data science practice, supports non-specialist learners, and provides a clear structure for engaging pre-service teachers in authentic data science activity. The EDUCATE framework was developed in response to this need.
5. The EDUCATE (Empowering Data Science Understanding for Teacher Education) Conceptual Framework
The EDUCATE framework draws directly on the core components of data science identified in Theme 1 and the curricular patterns and challenges identified in Theme 2, translating these insights into a pedagogically oriented structure for initial teacher education. The framework is targeted at two audiences:
Preservice teachers. Preservice teachers do data science from a learner’s perspective. For this audience, the framework is about doing data science.
Instructors of preservice teachers. For this audience, the framework is about teaching and doing data science. They are teaching preservice teachers and bringing them through the activities that involve doing data science.
While the EDUCATE framework outlines core dimensions of data science practice and pedagogy, it is not intended as a one-size-fits all model for K-12 education. Instead, it is design to support pre-service teachers and teacher educators in understanding and implementing data science in developmentally appropriate and context-specific ways. The relevance and application of each component will vary, naturally, by school grade level. For example, teachers of upper primary students may engage them in data collection visualisation and ethical -reasoning using real world contexts, while secondary students may explore more advanced topics such as modelling, algorithmic thinking, or predictive analysis. Consequently, the framework is intended to be adaptable and provide a structure to guide pedagogical planning.
The framework (see
Figure 7) describes the four-component PROCESS for doing data science and presents the data sciences
processes and aligned data science
practices. Its intent is to support the design of learning experiences for preservice teachers that guide them in navigating both the processes and practices involved in doing data science, enabling them to develop a coherent and integrated understanding of data-driven inquiry.
The data science process, tailored for pre-service teachers, is a streamlined four component process. Data science is not seen as a cyclical process, but rather as four components that interconnect and do not have a fixed sequential order: Get and explore data → formulate problem → model/analyse data → communicate results and action plan.
The nine data science practices are aligned with the four components of the data science process with many of the practices occurring during all of the processes. The practices focus on how tasks are carried out on a granular level, which techniques and tools are employed, and why certain methodologies may be preferred. The data science practices associated with the data sciences processes are described in
Table 3.
Get and Explore Data: This process involves gathering data either firsthand or by sourcing pre-existing data sets (often large and open source). Where pre-existing or secondary data are sourced, exploration of the data set also occurs here to gain insights into the structure of the data (e.g., identification of variables, types of data, distributions, visualisations).
Formulate the Problem: This process involves defining and framing the specific question or problem that the data science project aims to solve. Problem formulation is usually the first step in the data science process which then drives the gathering of data. Teachers, however, often source publicly available data sets for use in classroom contexts and in these situations, problem formulation may occur after the data have been sourced or explored.
Model/Analyse the problem: This process involves exploring data to uncover patterns, trends, and insights relevant to the problem. Where relevant, pre-service teachers are introduced to predictive modelling techniques that can be applied to reveal insights into the data.
Communicate Results and Engage in an Action Plan: The final process involves creating a narrative that explains the key findings and subsequently translating these data insights into actionable recommendations for self and society.
Table 3 is a synthesis of ten components that describe the nature and scope of data science practices, based on an integrative review of existing frameworks, curricula and empirical literature. While these components reflect broadly agreed-upon aspects of professional data science work, the table is not a direct extraction from the literature. Instead, it represents an amalgamation of research-based insights and reasoned judgement by the research team, guided by a commitment to educational relevance. Each component was carefully reviewed to determine its suitability and adaptability for use in school-based data science instruction, particularly in primary and secondary contexts.
Some elements were judged to have limited direct transferability to school settings due to their reliance on complex tools, technical expertise, or infrastructure that may not be available in educational environments. Conversely, other areas such as problem formulation, exploratory data analysis, and data ethics were not only well-supported in the literature but also highly relevant and feasible for classroom implementation, and thus were emphasised. The resulting table provides a conceptual bridge between professional practice and school-level adaptation. It foregrounds the core practices that teachers need to understand and help students engage with, without assuming specialist training.
Positioning the EDUCATE Framework in the Landscape of Data Science Education
The EDUCATE framework builds upon and extends existing models of data science practice. In contrast to other influential frameworks designed for professional or upper-secondary learners (
H. Lee et al., 2022), the EDUCATE framework was explicitly developed for use within initial teacher education, targeting pre-service teachers as non-specialist learners who are preparing to teach data science at the school level. To meet the needs of this audience, in contrast to frameworks comprising five or six distinct steps (
Boenig-Liptsin et al., 2022;
H. Lee et al., 2022;
Ow-Yeong et al., 2023), EDUCATE adopts a streamlined structure comprising four core processes that span the data science cycle. This simplification is intentional: by reducing the cognitive and technical demands placed on users, the framework aims to make data science more accessible to those without formal training in statistics, computer science, or data science.
A further point of differentiation lies in the treatment of data science practices. Unlike several other models that tightly link specific practices (e.g., data wrangling, modelling, or visualisation) to a discrete data science process, the EDUCATE framework intentionally does not align specific practices with specific processes. This design reflects the recognition that data science investigations are highly contextual, and that particular practices may be utilised in several of the data science processes in a project. This flexible approach supports recognises the iterative, non-linear nature of real-world data inquiry.
Lastly, the EDUCATE framework emphasises the development of action-oriented outcomes. Like
H. Lee et al. (
2022), it includes a final process focused on communication and action, a step often missing in other frameworks. This reflects a growing understanding that data science, especially in educational contexts, should not end with interpretation but should inform decision-making and engagement with real-world issues. Together, these features (accessibility for non-specialists, flexibility in practice-process alignment, and a focus on purposeful action) position the EDUCATE framework as a novel and pedagogically grounded contribution to the field of data science education.
6. Beyond the Framework: Practical Considerations for Implementation
While the EDUCATE framework outlines the core processes and practices of doing data science, additional practical considerations are essential for those planning to teach data science in real classroom settings. This section is intended primarily for pre-service teachers (PSTs) who are beginning to think about how they might design and deliver a data science module in their future classrooms. It may also support teacher educators seeking to help PSTs bridge the gap between conceptual understanding and classroom practice. A recurring question for PSTs—What do I need to consider when teaching data science in my classroom?—is an important question, as data science offers pedagogical challenges and possibilities that differ in important ways from traditional statistics instruction. Embracing data science often requires a shift in mindset, one that involves greater openness to complexity, uncertainty, and the messiness of real-world data. These considerations are not intended to replace or extend the framework, but rather to support pre-service teachers in making informed pedagogical decisions when enacting the framework in classroom settings.
At the school level, the most immediate difference is the nature of the data used. Unlike the tidy, clean, and often contrived data sets used in traditional statistics lessons, data science frequently engages students with large, messy, multi-variable, real-world datasets. These data may require preparation, cleaning, and transformation before they are even ready for analysis; tasks that are not typically part of school-level statistics instruction but are central in data science. Teachers must therefore make careful choices about how to scaffold students’ experiences with such data while still maintaining cognitive and curricular coherence. The INSTEP programme at NC State (
https://instepwithdata.org/public/about/ accessed 1 January 2026), and seminal publications such as
Ben-Zvi et al. (
2017), incorporate several of the considerations discussed below. Building on that model and the outcomes from our review, we present four key pedagogical aspects that pre-service teachers and teacher educators should reflect on when planning, teaching, and evaluating a data science activity. Alongside each consideration, we offer an illustrative example of how it may translate into classroom decision-making.
6.1. Context
Real-world contexts are at the heart of data science teaching. They not only ground the content in students’ everyday lives but also demonstrate the relevance and power of data science for understanding the world. Teachers should select contexts that are authentic, engaging, and meaningful for their students, whether drawn from: Sports analytics, environmental data and climate change; social media and digital trends; health and wellness; civic data (e.g., voting, local issues); social justice, equity, and sustainability; or, economics, food, and agriculture. The choice of context can affect student motivation, identity, and participation in powerful ways particularly when connected to issues students care about. For example, a 6th grade class might analyse local weather data as part of an integrated science and mathematics unit, while a secondary class might explore social media trends to examine digital behaviours or misinformation. Aligning the context of these data science tasks with students’ lived experiences and curricular goals increases relevance, and subsequently, engagement.
6.2. Key Ideas in the K–12 Curriculum
While data science is broader than statistics, statistical thinking remains foundational. PSTs must be able to recognise and build on key statistical concepts that appear in national or regional curricula. These may include types of data (categorical, numerical), data collection methods and sampling, descriptive statistics (mean, median, variability), data visualisation, distributions and patterns, relationships and correlation, inferential reasoning and prediction, concepts of bias and uncertainty. By connecting data science activities to existing curriculum content, teachers can more easily integrate them into lessons and assessment practices, while also extending students’ conceptual understanding.
In elementary settings, students might learn to identify categorical vs. numerical data using classroom surveys or exploring weather data. In secondary classrooms, they could engage in exploratory analysis of sports analytics or economics datasets using scatterplots and begin to model relationships using regression or informal inferential reasoning.
6.3. Tools
Data science instruction is often supported by digital tools. Tools must be chosen carefully and should be developmentally appropriate, accessible, and relevant to the learning goals. Some tools support data visualisation and exploration, others enable statistical analysis, while others are useful for programming, cleaning, or modelling data. Useful tools may include spreadsheets (e.g., Excel, Google Sheets), programming languages (e.g., R, Python with Jupyter Notebooks), visualisation platforms (e.g., CODAP, Tableau Public, Datawrapper) and data sources for real-world datasets (e.g., Kaggle, Gapminder, government portals). Teachers do not need to be experts in all tools but should be comfortable enough to guide students and make choices that align with learning outcomes. For younger students, tools such as CODAP offer low-floor entry to data analysis and basic visualisation without requiring programming. In contrast, high school students might use Python (via Jupyter notebooks) to engage with larger or more complex datasets. Choosing such tools balances teacher confidence, cognitive demand with technical accessibility.
6.4. Assessment Practices
Assessment in data science education should capture not just students’ technical skills, but also their ability to think critically with data, communicate findings, and engage in ethical reasoning. Traditional assessments may have a place, but more authentic and performance-based assessments are often better suited. Ideally, assessment strategies should reflect the iterative and investigative nature of data science work, providing space for creativity, revision, and reflection. Possibilities include individual or group projects, written reports or presentations, interactive data visualisations, data interpretation tasks, portfolios, peer-assessment, self-assessments, and rubric-based evaluations of inquiry, reasoning, or ethical decision-making.
In classrooms with younger learners, assessment might focus on interpreting bar charts, making a claim using data or describing patterns in data aloud or in writing. In secondary settings, students might design a data investigation and communicate findings through infographics or data reports assessed using rubrics that value technical skill, reasoning, clarity, and ethical considerations.
By considering these four dimensions—Context, Curriculum, Tools, and Assessment—PSTs and teacher educators can better prepare to implement meaningful data science education. These considerations complement the EDUCATE framework by offering practical guidance on bringing the framework to life in diverse classroom settings. Ultimately, the goal is not only to teach data science content, but also to empower teachers and students to engage thoughtfully, ethically, and critically with data, skills that are increasingly essential in today’s world.
7. Summary and Conclusions
This report has explored the urgent need to embed data science education into initial teacher education. Through a comprehensive systematic literature review and framework analysis, the literature suggests that while data science is increasingly central to societal and professional life, its integration into teacher preparation remains fragmented and underdeveloped. The review highlights key processes and practices of data science that are essential for pre-service teachers, not only to understand data science as learners but also to teach it effectively in school contexts. These include data preparation, exploratory analysis, modelling, visualisation, and sustained engagement with ethical issues and data governance. A notable emphasis across the literature is the importance of domain knowledge, interdisciplinary collaboration, and communication skills, positioning data science as a deeply contextual and socially relevant field.
Drawing from the findings, the EDUCATE framework was developed to support both the doing and teaching of data science in pre-service teacher education. It provides a flexible, practice-informed structure that aligns data science processes with critical pedagogical considerations, offering a pathway for teacher educators to meaningfully incorporate data science into their programmes. The framework acknowledges the complexity and dynamic nature of data science, while remaining accessible and relevant for pre-service teachers who are not data science specialists.
In addition to the conceptual framework, the report highlights key pedagogical considerations for implementation, including context, key curriculum ideas, tools and assessment. These considerations are not intended as instructional prescriptions, but rather as guiding prompts to support pre-service teachers preparing to bring data science into their future classrooms. These considerations also complement the framework by addressing key pedagogical concerns such as selecting meaningful contexts, aligning with curriculum goals, choosing appropriate tools, and assessing student understanding in authentic ways. These considerations are particularly important given the distinctive challenges of teaching data science, especially the use of complex, real-world datasets that differ significantly from those typically used in statistics education.
Taken together, the framework and the accompanying considerations contribute to ongoing efforts to develop data-literate teachers who can foster critical, inquiry-based, and ethically grounded engagements with data in their future classrooms. The EDUCATE framework is intended to be flexible and adaptable, allowing teacher educators to emphasise different processes and practices depending on grade level, curricular alignment, and available resources. In doing so, it provides a shared language and structure for thinking about data science education in schools, while leaving space for professional judgement and contextual adaptation. As the digital and data landscape continue to evolve, supporting teachers to navigate data science thoughtfully and responsibly will remain central to preparing students for participation in an increasingly data-intensive world.