Next Article in Journal
A Proposed MIMIC Structural Equation Model for Assessing Factors Affecting Time to Degree—The Case of the Greek Tertiary System
Next Article in Special Issue
Messy Data in Education: Enhancing Data Science Literacy Through Real-World Datasets in a Master’s Program
Previous Article in Journal
Leadership Practices That Enable and Constrain Retention in Early Childhood Education and Care Settings in Australia
Previous Article in Special Issue
Strengthening Data Literacy in K-12 Education: A Scoping Review
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

Curriculum, Pedagogy, and Teaching/Learning Strategies in Data Science Education

by
Cecilia Avila-Garzon
* and
Jorge Bacca-Acosta
Faculty of Mathematics and Engineering, Fundación Universitaria Konrad Lorenz, Bogotá 110231, Colombia
*
Author to whom correspondence should be addressed.
Educ. Sci. 2025, 15(2), 186; https://doi.org/10.3390/educsci15020186
Submission received: 13 December 2024 / Revised: 15 January 2025 / Accepted: 24 January 2025 / Published: 5 February 2025
(This article belongs to the Special Issue Theory and Research in Data Science Education)

Abstract

:
Data science education is an interdisciplinary and multidisciplinary field, with curricula continually evolving to meet societal needs. This paper aims to report a bibliometric analysis focused on the pedagogical aspects and teaching/learning strategies employed in data science curriculum design, emphasizing contributions from key authors, publication sources, affiliations, content, and cited documents. The analysis draws on metadata from documents published over a 20-year period (2005–2024), encompassing a total of 1245 documents sourced from the Scopus scientific database. Additionally, a scoping review of 20 articles was conducted to identify key skills, topics, and courses in data science education. The findings reveal a growing interest in the field, with an increasingly multidisciplinary and interdisciplinary approach. Advances in artificial intelligence and related topics, such as linked data, the semantic web, ontologies, and machine learning, are shaping the development of data science curricula. The main challenges in data science education include the creation of up-to-date and competitive curricula, integrating data science training at early educational stages (K-12, secondary schools, pre-collegiate), leveraging data-driven technologies, and defining the profile of a data scientist. Furthermore, the availability of vast amounts of open, linked, and restricted data, along with advancements in data-driven technologies, is significantly influencing research in the field of data science education.

1. Introduction

Data science is characterized by its interdisciplinary nature, bridging fields such as mathematics, statistics, hacking, and computer science, among others (Chao et al., 2020). This cross-disciplinary focus leads professionals from various domains to engage in data science processes, making data science education an increasingly essential component of educational programs across disciplines like information science, mathematics, computer science, psychology, cultural studies, sociology, politics, and more. Furthermore, there is a growing demand for data scientists who can analyze and interpret the vast amounts of data stored and published (both privately and publicly) for various purposes. Individuals interested in learning data science are often focused on understanding how social, economic, political, and other factors influence the data being analyzed (Williams et al., 2021). In this context, some authors suggest that while there is a notable increase in the development of new data science education programs, the pedagogical aspects of these programs have not received sufficient attention (Mike, 2020; G. Shao et al., 2021; Williams et al., 2021). Some authors even recognize that data science education is an emerging scientific discipline (Dogucu et al., 2024; Mike et al., 2023).
While there are many articles on the field of data science (even though they may or may not be connected with education), and research studies on how to apply data science in educational contexts are becoming more common, to the best of our knowledge, at the time of writing this paper, there were no other bibliometric analyses or systematic reviews in the field of data science education. There are bibliometric analyses and systematic reviews on the fields of data mining, big data, and data science in specific fields, but they are focused on how learning and teaching processes can take advantage of these fields to collect, organize, analyze, and extract patterns from data. In the field of data science education, there are few specific publication sources (Hazzan & Mike, 2021). Indeed, in recent years, the Journal of Statistics Education broadened its scope to include data science education, and now it is called the Journal of Statistics and Data Science Education (Horton, 2022).
This study aims to conduct a bibliometric analysis of data science education publications bringing attention to the curriculum, pedagogy, and teaching/learning strategies in this field. With a bibliometric analysis, many types of articles can be considered and broadly analyzed to draw accurate conclusions about the dynamics of research in a specific field. A bibliometric analysis might benefit other researchers who are looking for trending topics in the field and might also benefit policymakers (Aria & Cuccurullo, 2017). Furthermore, bibliometric studies are a key to mapping the state of the art of a particular research field (Oliveira et al., 2019). The data of this study were obtained from the Scopus database, which has been widely adopted for analyzing the existing literature and performing bibliometric analysis in the educational field. Moreover, we selected this database because its contents are well reviewed and were selected by an independent content selection and advisory board, and the database has a wide spectrum, covering 330 disciplines and including more than 7 thousand publishers, more than 24.6 million open access items, more than 97.3 million records, and more than 19 million author profiles (Scopus, 2024).
The main challenges identified in this analysis focus on defining an updated and competitive curriculum for data science, adopting data science training from secondary schools to higher education institutions, promoting the use of data-driven technologies, and describing the main competencies, skills, and abilities for the data scientist professional profile. The results of this analysis can be used by researchers, educators, policymakers, and other stakeholders to make decisions on conducting research in data science education.
This paper is organized as follows: the following section describes the related work, then Section 3 describes the methodology for conducting the bibliometric analysis, and after that, in Section 4, the results are presented according to the categories analyzed, and finally, in Section 5 and Section 6, the discussion and conclusions are, respectively, presented.

2. Related Work

The essence of data science is to apply and develop methodologies and tools that allow us to analyze large datasets in order to obtain meaningful insights in a wide range of contexts (Bonnell et al., 2022). In this context, professionals receive training to achieve those goals when analyzing data. Skills and competencies needed to be a data scientist are being defined as part of the data science profession (Chao et al., 2020; Demchenko et al., 2021b). Such a profession is supported by teaching/learning processes that start even from secondary school, because today’s world is demanding that more people, in whichever age range, are more specialized in solving problems by applying data-driven techniques (Bai & Hu, 2018; Demchenko et al., 2021a). However, curriculum contents and pedagogical aspects are still in discussion because those curricula need to be updated as data availability increases, as well as the sizes of available datasets, and data-driven technologies advance (Demchenko et al., 2021b; Li et al., 2019; Mikroyannidis et al., 2018).
Previous bibliometric analyses have been conducted in related fields such as big data and data mining. For instance, Marín-Marín et al. (2019) conducted a bibliometric review focused on big data in the educational field from the perspective of the vast amount of data generated by students and their use of data in educational contexts. The authors analyzed 1491 documents from different databases; the period of search was 2010–2018. Boztaş et al. (2024) conducted a bibliometric study about educational data mining by analyzing the period 2010–2021. In another bibliometric study, Rao and Chen (2024) used the Scopus database and collected studies about the developments and impact of data mining in educational research. Their period of analysis was 2010–2022, and they selected a total of 1439 documents. However, the focus was on the use of data in educational settings, but not on curriculum development. In the same vein, Samsul et al. (2023) presented a bibliometric analysis, having selected 250 documents from the period 2012–2021 about big data in education and learning analytics. Raban and Gordon (2020) conducted a bibliometric analysis on the evolution of big data and its relationship with data science, but the education aspects of big data and data science were not considered.
Msweli et al. (2023) conducted a scoping review in data science education about the period 2010–2021, selecting 91 research articles from different databases. The authors highlight the need for pedagogies for data science education and frameworks to guide the development of data science courses at different educational levels. The authors also identify some challenges, such as the fragmentation of the field due to its multidisciplinary nature and the lack of flexible curricula that can be adapted to the rapid societal changes occurring. Finally, the authors call for more research on teaching strategies and curricula in this field.
Dogucu et al. (2024) performed a systematic literature review about data science education at undergraduate levels. They collected data from different scientific databases and selected 77 articles. The authors concluded their study with three important recommendations: (1) to see data science as an interdisciplinary field; (2) to use keywords or terminology that make data science studies identifiable in literature searches; and (3) to prioritize investments in data science empirical studies.
Another systematic review analyzed 44 records about big data-driven education evaluation from Springer Link, Wiley, Taylor & Francis, and ScienceDirect from the period 2010 to 2022 (Lin et al., 2024). The study highlights the use of technologies such as machine learning for improving learning processes. In the same vein, Memarian and Doleck (2024) analyzed 23 studies from WoS and Scopus. They found hands-on experiences, active learning, the instructor as a coach, real-world learning experiences, and meaningful learning as pedagogical practices in data science.
There are also some reviews focused on specific locations. For instance, Hsu (2024) conducted a systematic review of 60 data science courses’ syllabi in Taiwan. In this review, the author identified the types of instructional materials used in data science courses as well as the assessment techniques, learning objectives, and competencies. However, the focus of this study was only on syllabi in Taiwan.
In this context, although there are previous bibliometric analyses in related fields, as mentioned before, there is still a lack of bibliometric studies that provide a snapshot of the current state of research on data science education from a more pedagogical and curriculum development perspective. A bibliometric analysis is useful to draw the landscape of research in data science education and provide evidence of the advances in this field as well as the trending topics in the literature.

3. Methodology

In this study, research documents on data science education were analyzed under a bibliometric approach. The word bibliometric refers to the application of statistical methods to identify research trends, top authors, top affiliations, top documents, and other items to provide a general overview of the current research work done in specific fields (Ellegaard & Wallin, 2015). Such methods are applied to research information communicated through papers, books, or other types of research publications. Bibliometric analyses have gained significant importance in recent years for different research topics from the educational field (learning analytics (Azevedo & Azevedo, 2021), blended learning (Raman et al., 2021), assessment and evaluation (Sun et al., 2024), etc.).
The steps defined for this methodology are depicted in Figure 1.
Each of the steps in the methodology is described in the following sections.

3.1. Keywords

The main research field in this study is related to “data science education”; however, when conducting a basic search about this topic, some of the results refer to the application of data science techniques in the educational field, and other results should be omitted because there are other keywords more appropriated to refer to educational processes. Thus, we asked another colleague in the computer science education field about more specific search keywords that connect to more related results in “data science education”. Moreover, we inquired about other review papers on educational processes in different areas to identify other relevant keywords. As a result, we decided to include the following keywords: data science, education, pedagogy, teaching, curriculum, didactics, learning strategies, professional profile, academic, courses, syllabus, and teachers. In this regard, the search query used in this study is as follows: (TITLE-ABS-KEY (“data science”) AND (TITLE (“pedagogy”) OR TITLE (“teaching”) OR TITLE (“curriculum”) OR TITLE (“didactics”) OR TITLE (“courses”) OR TITLE (“syllabus”) OR TITLE (“teachers”) OR TITLE (“learning strategies”) OR TITLE (“professional profile”) OR TITLE (academ*) OR TITLE (“education”))). The * is a wildcard, meaning that words that start with the letters before the wildcard will be considered in the search results. The main topic of “data science” was searched for in fields such as the article title, abstract, and keywords, and words related to “education” (including this word) were searched for only in the article title, just to ensure the studies/documents in the results deal with or at least are well approximated to the “data science education” field.
Additionally, we conducted a scoping review by using journal articles focused on describing, analyzing, or reporting details about courses contextualized in the data science education field. The aim of this analysis was to identify specific data science courses, topics, and skills reported in the studies. The articles were filtered by specific words included in the title by using the following search string: (TITLE (“data science”) AND (TITLE (“course”) OR TITLE (“syllabus”) OR TITLE (“curricula”) OR TITLE (“curriculum”))).

3.2. Inclusion and Exclusion Criteria

The definition of inclusion and exclusion criteria allowed us to perform an initial screening of the articles after the search process. We defined the following inclusion and exclusion criteria:
  • Inclusion criteria: document type (conference paper, article, review, and book chapter), documents written in English and Spanish, and time frame: 2005–2024.
  • Exclusion criteria: undefined authors (the name of the author is not specified).

3.3. Search

The Scopus database was selected for this study because it is one of the largest abstract and citation databases and delivers a comprehensive overview of the research results around the world (Scopus, n.d.a). This database is managed by Elsevier, was released for the first time in 1996, and is also highlighted as one of the most established bibliographical sources (Visser et al., 2021), with over 97 million content records (Scopus, n.d.b). Then, the search was conducted using the above-mentioned search query, which retrieved 1354 document results.

3.4. Screening

Since the analysis is focused on applying statistical techniques to the results, a basic screening was conducted by applying inclusion and exclusion criteria using the filters provided by the Scopus database. For instance, we filtered documents published between 2005 and 2024. In the end, 1245 documents were obtained and used to conduct the analysis. To avoid possible errors when using the bibliometric analysis tools, it is important to exclude information that is tagged as “undefined”.

3.5. Analysis

The aim of this study is to provide a conclusive overview of the research conducted around the world in terms of pedagogy, curriculum, and teaching/learning strategies (also involving didactics, courses, syllabi, teachers, learning strategies, professional profiles, and academic items) in the field of data science education. The research questions that this study aims to answer are as follows:
  • RQ1: What is the evolution of research in data science education in terms of annual scientific growth, the countries and affiliations that contribute the most, and the most relevant publication sources?
  • RQ2: What are the academic contributions to the field of data science education from the perspective of authors in terms of the number of publications, keywords, and citations per year?
  • RQ3: What are the trend topics in the data science education field based on content analysis?
  • RQ4: What are the recent advances and future challenges in pedagogy, curriculum, and teaching/learning strategies in the data science education field?
  • RQ5: What are the core courses, skills, and concepts in the data science education field?
Considering the research questions, the categories of information to analyze are listed below:
  • Metadata analysis: a general analysis of the results obtained, considering annual scientific growth (identification of the number of documents published per year), scientific production per country (identification of contributions made by the top 20 countries), production per affiliation (identification of contributions made by the top 20 affiliations), and most relevant sources (identification of contributions made by the top 20 sources of publication).
  • Academic contributions analysis: this analysis allows us to identify the scientific outputs of different authors, where they come from, what their contributions are, and which other authors they cited. This analysis is based on the contributions of authors over time (identification of contributions made by the top 20 relevant authors in each year), relevant keywords (most common words used in the studies), and relations among authors cited in such contributions.
  • Content and document analysis: identification of most common topics with a thematic evolution based on the authors’ keywords. For this category, we used the results from the scoping review about specific data science courses, topics, and skills reported in the studies.
After conducting the search and applying the inclusion and exclusion criteria, we obtained a total of 1245 results. The results obtained were exported from Scopus into BibTeX and a CSV (Comma-Separated Values) files by including citation information (author(s), document title, year, source title, volume, issue, pages, citation count, source, document type, and DOI), bibliographic information (affiliations, serial identifiers, publisher, editor(s), correspondence address, and abbreviated source title), abstracts, keywords (author and index keywords), and references. The files containing this information can be consulted in the link to the repository under the Data Availability Statement.
In order to analyze the results of this study, we used bibliometrix (Aria & Cuccurullo, 2017), which is an R package for mapping analysis based on scientific literature. It can be installed locally, or the R Studio cloud web tool allows us to create an R project for using bibliometrix on the web. The other tool selected for this study was VOSviewer (van Eck & Waltman, 2010), which is a tool for generating bibliometric networks; it should be downloaded to be used. Both tools are free software.
The BibTeX file was loaded into the bibliometrix tool, and the CSV file was loaded into the VOSviewer tool to analyze bibliometric information, such as scientific production, sources, authors, content, thematic evolution, co-citation, and collaboration.
The last step was to show and then discuss the results obtained after retrieving all the artifacts from each one of the listed categories of information. As a result, detailed information will be given about the recent advances and future challenges in pedagogy, curriculum, and teaching/learning strategies in the data science education field. Section 4 shows the results from this study, and Section 5 presents the discussion of those results.

4. Results

This section is organized according to the categories of information described in the previous section (metadata analysis, analysis of authors, and content and document analysis).

4.1. Metadata Analysis

This section presents the metadata analysis in terms of the main information from the dataset, research areas, annual scientific growth, scientific product per country, most relevant affiliations, and most relevant publication sources.

4.1.1. Dataset Description

Table 1 shows the quantities of items, such as sources, documents, references, average citations per document, average citations per year per document, document types, document contents, and authors.

4.1.2. Research Areas

Figure 2 shows the distribution of documents published in different areas. Computer sciences (26%), social sciences (24%), engineering (12%), and mathematics (9%) are the most preferred publication areas in the data science education research field. There are other areas that also catch the attention of researchers, such as the decision sciences, medicine, physics and astronomy, business, management and accounting, environmental sciences, psychology, etc.

4.1.3. Annual Scientific Growth

Figure 3 depicts the number of documents published per year. The average number of documents published every year is 62.25; however, since 2016, more than 30 documents have been published every year. Moreover, the period 2016–2024 had the highest productivity, with 1212 documents published, and 2021 was the year with the highest number of publications, with 232 documents. This result suggests an increasing research interest in the data science education field in recent years.
A possible explanation for the recent interest in the data science education field is the increasing demand for more data scientist professionals to manage the large amounts of data available in public or private domains and the velocity attributed to big data (De Veaux et al., 2017).
Table 2 lists three relevant articles of the classification per year. The first one focused on the applications of machine learning for image analysis, the second one on bioprocessing, and the third one on data science education. The third article in this list is directly related to the main topic of our bibliometric analysis: data science education from a pedagogical perspective (Mike, 2020). The author introduces a framework with ten pedagogical challenges in data science education, divided into three categories: (1) discipline (interdisciplinarity, data domain, concept comprehension, cognitive load), (2) skills (non-technical skills, ethics, research skills), and (3) environment (real-life tasks, learners, data science teachers) (Hazzan & Mike, 2020). Articles in this line really contribute to the sphere of knowledge for the construction of a reliable curriculum in data science education.

4.1.4. Scientific Production and Country Collaboration

Undoubtedly, the United States is the country contributing the most in the field of data science education, with 523 documents, which is more than two times the contributions of the second country, China, with 133, followed by the United Kingdom, with 98 documents published. The rest of the top 20 most contributing countries have published between 14 and 50 documents. Information about the top 20 most contributing countries and regions is depicted in Figure 4. Moreover, Figure 5 depicts the collaboration among different countries, highlighting again the United States, China, and the United Kingdom as the countries that collaborate most with other countries in this field. Table 3 lists three relevant articles from each of the three most contributing countries. For the United States, the first article is focused on academic libraries, the second one addresses gaps in data science education in the United States, and the third one deals with integrating data science in an educational information technology course. In China, the three relevant articles focus on curriculum and competencies in data science courses. From this sample of articles, it can be noticed that the main concerns are the definition of course curricula as well as the competencies and skills of data scientist professionals.

4.1.5. Most Relevant Affiliations

Figure 6 shows contributions from the top 20 most relevant affiliations. The average number of documents published in this top 20 category is nine documents. Eight of the institutions have more than 10 documents published, in contrast with the others, with fewer than 10 documents published. Stanford University, the University of California—Berkeley, and the Universidad Autónoma de México are the three most relevant affiliations, with 21, 20, and 16 documents, accordingly.
Table 4 lists the three relevant articles from Stanford University. Those articles address topics such as curricula, professional practice, STEM, and analysis of areas such as data science and machine learning in health professions.

4.1.6. Most Relevant Sources

Figure 7 shows the top 20 most relevant sources of publication in the data science education field. Seven of them are journals, and the others are books or proceedings series. The Journal of Physics is the most relevant source, with 54 documents; however, it publishes a conference series; the second relevant journal is the Journal of Statistics and Data Science Education, with 35 documents, followed by the Computers & Education journal, with 21 documents. In the case of conference proceedings, the most relevant is the ACM International Conference Proceedings Series, with 51; the second most important is the ASEE Annual Conference and Exposition, with 38 documents, followed by the Annual Conference on Innovation and Technology in Computer Science Education, with 24 documents. When inquiring about the most relevant articles from the Journal of Physics: Conference Series (the first in the list), we found that they deal with the application of data science techniques to different educational contexts. Then, we revised the next relevant source, the ACM International Proceedings Series, and we found that out of the three relevant articles that are more related to the data science education topic, two of them appeared as relevant in previous tables (Table 2, Table 3 and Table 4) of most relevant articles (Bai & Hu, 2018; Demchenko et al., 2021b). The third one deals with the definition of a semantically enriched curriculum by using an ontology of data science competencies and body of knowledge (Demchenko et al., 2019).

4.2. Analysis of Authors

This section presents a brief analysis of the authors who contributed the most, as well as an analysis of the keywords with the most occurrences in the dataset analyzed.

4.2.1. Most Contributing Authors

Figure 8 shows a chart with the top 20 authors’ production over time. The most contributing authors were measured according to the number of documents published and the citations received. The larger the circle, the higher the number of publications, and the darker the circle, the higher the number of citations per year. In that regard, this measure does not include the most cited articles in the field. Instead, it considers publication quantity, and the number of citations received per author. Here, SALAS-RUEDA is the most contributing author, with 15 articles published (1 in 2019, 3 in 2020, 6 in 2021, 6 in 2022, 1 in 2023, and 2 in 2024), and CAR, J. is the most cited author, with at least 298 citations. Table 5 presents more details for the to-20 authors.

4.2.2. Analysis of Authors’ Keywords

Figure 9 is a co-occurrence network of authors’ keywords showing that the topics of data science and education appear to be connected to terms associated with pedagogical aspects, such as curriculum, curriculum design, pedagogy, teaching, active learning, data literacy, learning analytics, assessment, and online education, among others, but also to terms associated with technical or disciplinary topics, among which are big data, data mining, statistics, information science, machine learning, data visualization, and computational thinking, in recent years (2019 and beyond). Moreover, there are some terms that currently are hot topics in most research areas, such as the use of artificial intelligence tools (i.e., ChatGPT). This result suggests that recent research has focused on investigating the pedagogical aspects of teaching data science and computational thinking.
To have more details about this co-occurrence network, the VOSviewer tool arranged the keywords by clusters, and each cluster is a set of related nodes. For this analysis, the tool identified nine clusters as follows:
  • Cluster 1: the focus in this cluster is educational technologies and analysis, which connects 17 topics (bibliometrics, blended learning, curriculum design, data sciences, design thinking, e-learning, educational data science, educational innovation, engineering education, higher education, interdisciplinarity, learning analytics, natural language processing, skills, teaching and learning, text mining, and virtual reality). According to these topics, teaching and learning strategies in educational data science involve the development of skills for conducting analysis by integrating approaches such as learning analytics, natural language processing, and text mining, among others. Moreover, there are other technologies that integrate data science elements (e.g., virtual reality allows developers to capture data from users’ interactions, and these data can be analyzed for multiple purposes).
  • Cluster 2: in this case, the focus is also on the educational setting, but is more related to computing and programming education and tools or techniques for data analysis. This cluster connects 17 topics (academic performance, ChatGPT, computing, computing education, course design, curricula, data mining, data science, deep learning, ethics, experimental learning, gamification, programming, programming education, Python, R, visualization). There are some aspects of curriculum design in computing education, such as course design, the curricula, and some teaching and learning strategies (i.e., gamification and experimental learning). Moreover, some of the mentioned skills, tools, or techniques for data analysis are programming, data mining, deep learning, Python, R, and visualization.
  • Cluster 3: this cluster addresses the topic of educational data science and its application in different settings. It includes 16 topics (AI, artificial intelligence, assessment, continuing education, curriculum, cybersecurity, data analysis, educational data mining, health informatics, IoT, medical education, pedagogy, professional development, research, STEM, training). Some of the application settings are health, medicine, and cybersecurity.
  • Cluster 4: the main topic in this cluster is the skills and abilities for data science. It involves 15 topics (analytics, collaboration, computational thinking, computer science education, data ethics, data literacy, data management, data science education, digital transformation, information visualization, open data, statistical literacy, statistics education, statistics education research, teaching statistics). These topics reflect the learning path of someone who wants to dive deep into the data science world.
  • Cluster 5: this cluster focuses on teaching strategies and pedagogical issues. This cluster has 12 topics (data science applications in education, distance education and online learning, evaluation methodologies, improving classroom teaching, learning communities, lifelong learning, pedagogical issues, postsecondary education, secondary education, statistical computing, teacher professional development, teaching/learning strategies). All these topics refer to the labor of data science teachers.
  • Cluster 6: the main topic in this cluster is online technologies and platforms for data science education. It involves 10 topics (big data technology, data analytics, data visualization, Jupyter Notebook, MOOCs, online education, online learning, project-based learning, simulation, undergraduate education). Online education and online learning make it possible to pursue data science programs at different levels, such as is the case in undergraduate education, and the massive open online courses (MOOCs) bring the opportunity to pursue educational programs in an online setting.
  • Cluster 7: this cluster deals with interdisciplinary approaches that contribute to the data science education field, and, particularly, with curriculum design. This cluster is composed of nine topics (big data, cloud computing, computer science, curriculum development, the EDISON Data Science Framework, education, informatics, interdisciplinarity, statistics). Here, interdisciplinarity plays an important role because the data science education field does not function as a singular field; it depends on other fields that complement the educational path for the data scientist.
  • Cluster 8: this cluster addresses technology in education in the new era and how it has been influenced by health and societal issues such as COVID-19. Eight topics are included in this cluster (COVID-19, educational technology, ICT, learning, machine learning, MOOCs, teaching, technology). The MOOC term appears again in this cluster, as an indicator of platforms that serve online education.
  • Cluster 9: this cluster has an analytical focus. It includes six topics (active learning, big data analytics, business intelligence, content analysis, information sciences, information technology). These topics are related to techniques for data analysis that are also part of the knowledge that a data scientist should acquire in the learning process.

4.2.3. Pedagogical Aspects and Teaching/Learning Strategies

Figure 10 shows a density visualization map with topics that are somehow connected with teaching/learning strategies and pedagogical aspects in the data science education pedagogical path. In this figure, terms that appear with yellow colors are the most frequent, while terms in blue colors are less frequent. Most of the terms refer to the skills and knowledge that students should acquire as part of the hard skills (big data, learning analytics, natural language processing, data mining, deep learning, artificial intelligence, machine learning, text mining, business intelligence). Other topics are more in line with processes for managing data (data analysis, data literacy, data analytics, data visualization, data ethics). There are also some topics in a transdisciplinary line, for instance, those related to the field of computer science or information science (programming, cloud computing, statistical computing) or statistics (statistics education). Moreover, some of the topics deal with more pedagogical aspects of data science education, such as general aspects (evaluation methodologies, pedagogy, teaching/learning strategies, teacher professional development), or related to the educational setting (higher education, continuing education, secondary education, undergraduate, online learning, blended learning, learning communities, lifelong learning).
Regarding the teaching/learning strategies in data science education, this analysis revealed that active learning, experiential learning, project-based learning, computational thinking, gamification, and design thinking are the most frequently adopted strategies.

4.3. Content and Document Analysis

This section presents the results in terms of thematic evolution and the analysis conducted to identify courses, topics, and skills reported in different studies.

4.3.1. Thematic Evolution

A thematic evolution in the data science education field is depicted in the Sankey diagram of Figure 11. This diagram has a cutting point in 2016, which is the year after which more documents have been published in the data science education field. Then, the diagram is divided into three stages: 2005–2020, 2021–2022, and 2023–2024. It is also an overview of trending topics in data science education. Data science education research comes from studies dealing with online education, data analytics, and learning analytics. Some recent topics are data visualization, text mining, learning analytics, the management of analytical tools, and artificial intelligence.
A thematic map allows the identification of emerging (or pending), niche basic, and motor themes. According to the thematic map from Figure 12, emerging topics in this field (upper-left quadrant) are data visualization and computational thinking. The well-developed and important topics (upper-right quadrant) in data science education include big data and machine learning. The transversal topics (basic themes in the lower-right quadrant) include artificial intelligence and computer science education.

4.3.2. Courses, Topics, and Skills in Data Science Education

A scoping review was conducted to identify courses, topics, and skills in data science education. By using the specific search (TITLE (“data science”) AND (TITLE (“course”) OR TITLE (“syllabus”) OR TITLE (“curricula”) OR TITLE (“curriculum”))), 44 journal articles were found. After searching and screening each one of the articles, and excluding those without access to the full text and those that do not present specific information about courses, topics, or skills, 20 articles were analyzed. To do so, the names reported in each article for the categories of courses, topics, and skills were registered in a table. The file containing this information can be consulted as supporting information in the Data Availability Statement.
As a result of the scoping review, we identified the analytical and knowledge skills in data science education. Figure 13 summarizes these skills. On the one hand, analytical skills involve processes such as identification, interpretation, data curation, and data visualization. On the other hand, knowledge skills are more related to acquiring knowledge and specific skills on different topics (e.g., programming languages, databases, tools and commands, machine learning, deep learning, simulation, and data structures, among others).
As a result of the scoping review, Figure 14 depicts the word cloud with common names of courses in the data science education field. This word cloud was created considering the whole names of the courses and excluding some isolated words, such as “analysis”, “computer”, “technology”, “course”, “courses”, “data”, “introduction”, “human”, and “machine”. Moreover, common courses mentioned are Data Science, Introduction to Data Science, Data Visualization, Machine Learning, and Programming, among others.
Figure 15 depicts the word cloud for the topics addressed in the different studies. The main topics are data science, data collection, data visualization, machine learning, programming languages, visualization, data management, data cleaning, statistical analysis, and data scraping, among others. Thus, the main topics focus on the development of knowledge and skills to manage data. In this category, we also noticed that another important topic in the data science courses is the ethical aspects of data.
Figure 16 depicts the word cloud of skills mentioned in the articles. These are the skills properly associated with the discipline of data science and that come from areas like statistics, computing, or mathematics, or from specific domains. Moreover, some authors highlight the importance of involving soft skills such as effective communication to present information properly, and the need for people capable of solving problems in different contexts and professionals who can make good decisions based on the data analyzed.

5. Discussion

In this section, the discussion is organized according to each research question defined for this study.

5.1. RQ1: What Is the Evolution of Research in Data Science Education in Terms of Annual Scientific Growth, the Countries and Affiliations That Contribute the Most, and the Most Relevant Publication Sources?

In the field of data science education, this review reports 1245 documents published between 2005 and 2024. The most contributing country is the United States, and the most contributing affiliation is the University of California. Moreover, the most relevant source of publication is the Journal of Physics: Conference Series. Overall, the results show that the publications in the field have risen steadily year by year. Most of the documents have been published since 2016, highlighting the emerging research interest in topics related to this field.
Looking at the evolution of data science education since 2016, we found that there are some key strategies to improve the capacity of higher education institutions for recruiting students in the data science and analytics fields (Fitzgerald et al., 2016). For instance, it is necessary to address the approach of inquiry-based learning in statistics courses to potentiate the students’ decision-making and analytical abilities (Golubski, 2016). This can be achieved by integrating open-source tools into the curriculum for data engineering and data science (Drummond, 2016), such as the InfoQ framework (Kenett & Shmueli, 2016). Other strategies include the creation of a customizable data science education curriculum and the use of virtual labs for services in demand (Demchenko et al., 2017). It is noteworthy that some data science curricula are based on the EDISON Data Science Framework (Wiktorski et al., 2017) and foster the communicative power of data visualization, applying techniques such as data storytelling (V. Echeverria et al., 2017), problem solving in educational settings (Klašnja-Milićević et al., 2017), and the use of tools to train students in data science (i.e., the DataLab tool (Zhang et al., 2017)), among others.
Since 2018, there has been an increasing interest in testing and validating data science curricula (West, 2018) and considering some relevant topics to adjust or include in data science courses (the problem-driven approach (Bai & Hu, 2018); curricula for data science in Europe (Mikroyannidis et al., 2018); new curriculum proposals (Kauermann & Seidl, 2018); ethics in data science curricula (Saltz et al., 2018); data science curricula for secondary schools (Heinemann et al., 2018); statistical curricula as a driver for teaching computing (Çetinkaya-Rundel & Rundel, 2018; Kaplan, 2018); open environments for data science education (Hoyt & Wangia-Anderson, 2018); guides on how to teach data science (Hicks & Irizarry, 2018), etc.).
In terms of the implementation of data science curricula, it is important to know some case studies of successful implementations, and that is possible to do so, by sharing experiences on data science courses (Bile Hassan et al., 2021; Donoghue et al., 2021; Labou et al., 2019). Once a course is implemented, some concerns about the appropriate application of pedagogical aspects appear. The existing literature about data science education tells us that the interdisciplinary approach means that various instructors from different disciplines teaching, or one instructor teaching different topics, has led them to have a better development of data science courses (Asamoah et al., 2020). Pedagogical aspects also involve topics that are part of the contents of data science courses and others that allow educators to improve the curriculum. Some of them include the use of general but specialized programming languages (i.e., Python (Friedman, 2019), NPL (Walker, 2024)), the use of academic libraries in data science courses (G. Shao et al., 2021), the use of web semantic techniques like ontologies for customized data science curricula (Demchenko et al., 2019), the development of competencies and assessment for a data scientist professional (Demchenko et al., 2021b), training in statistics, research methods, and data-driven inquiry skills (Burr et al., 2021; Perron et al., 2020), the use of data science models (i.e., the Conway model) for information schools (Hagen, 2020), considering foundations in data science and the workflow of data in data science education at different levels (Frischemeier et al., 2021; Merryman & Lu, 2021; Mike et al., 2020; Tamai et al., 2021), involving other application areas such as political sciences (Williams et al., 2021), etc.
Regarding the method of imparting knowledge in data science courses, some studies highlight the benefits of using distance learning settings (Rampure et al., 2021; Sakamaki et al., 2022) and that there are ethical aspects that need to be considered, such as data quality, the diversity of algorithms, data transparency, and privacy and data protection (Lewis & Stoyanovich, 2021).
Some authors have also been interested in theory, methods, and applications in a teaching ecosystem for data science (Tobar et al., 2021). However, there are some challenges and issues to attend to (Bonnell et al., 2022). For instance, artificial intelligence tools are increasingly becoming allies for multiple purposes (Ki Kim et al., 2023), but their use is disrupting and changing the way different analytical processes are conducted.
All the cited studies come from the dataset analyzed in this study. The topics addressed in these and other studies focus mainly on reporting research scenarios, experiences, reviews, or case studies that deal with curriculum improvement, implementation of data science courses, and the use of different tools and strategies to promote data science training.

5.2. RQ2: What Are the Academic Contributions to the Field of Data Science Education from the Perspective of Authors in Terms of the Number of Publications, Keywords, and Citations per Year?

Most contributing authors have published articles in the data science education field since 2016. Although Salas-Rueda is the most contributing author, his papers address data science from the perspective of the application of this field to the analysis of different contexts, such as social media (Salas-Rueda, 2021) or virtual teaching environments (Salas-Rueda et al., 2020). Meanwhile, the contributions from Demchenko, Williamson, and Horton really deal with the specific topic of pedagogical challenges and issues in data science education. For instance, Demchenko addresses topics such as customizable data science education curricula (Demchenko et al., 2017, 2019), instructional models (Demchenko et al., 2014), a data science framework called EDISON (Demchenko et al., 2021a), competencies and assessment for data science (Demchenko et al., 2021b), and the use of ontologies in the data science curriculum (Demchenko et al., 2019). Williamson addresses the topic of data science in the digital governance of education (Williamson, 2015). Finally, Horton deals with data science in statistics curricula (Hardin et al., 2015), data science in postsecondary education (Kloefkorn et al., 2020), and computational thinking in the statistics and data science fields (Horton & Hardin, 2021).

5.3. RQ3: What Are the Trend Topics in the Data Science Education Field Based on Content Analysis?

The co-occurrence authors’ keywords chart depicted in Figure 9 allowed us to easily identify if the dataset of documents analyzed in this bibliometric review that were highly related to the data science field. In this context, teaching, data literacy, computer science, data analytics, data visualization, text mining, learning analytics, cloud computing, statistics, Python, educational data mining, curriculum design, project-based learning, and training are words closely related to the main topics of data science education. Some topics that have appeared more frequently in recent years are data science applications, artificial intelligence, machine learning, computer science education, and data literacy. This result suggests that these are the topics related to data science education that recently attracted the attention of researchers.
The thematic map in Figure 12 also shows two trend topics in data science education: computational thinking and data visualization. Computational thinking is one of those pedagogical strategies that contribute to the development of skills in data science education. It is a kind of scientific method that allows one to solve real-life problems that can be complex or messy and to develop different skills in four levels (Berikan & Özdemir, 2020): level 1—scientific thinking (collecting data, analyzing data, representing data); level 2—algorithmic thinking (problem decomposition); level 3—abstraction (pattern recognition, parallelization); level 4—automation (technical skills, computer science techniques and tools, computing capacity).
Regarding data visualization, Fernandez et al. (2024) point out that there is a need for more tools to support data science students in the process of constructing data visualizations and suggest a framework for analyzing and discussing educational tools for data visualization. Another aspect that deserves more research is the need to investigate how students create their own visualizations and interpret them (Fernandez et al., 2024).

5.4. RQ4: What Are the Recent Advances and Future Challenges in Pedagogy, Curriculum, and Teaching/Learning Strategies in the Data Science Education Field?

In the course of its evolution over the years, all the topics mentioned highlight key challenges addressed by various studies, including pedagogical aspects, the development of effective curricula, the use of frameworks and tools, and the adoption of diverse teaching and learning strategies (such as problem-based and inquiry-based learning, statistics-focused learning, and laboratory work, among others).
Although data science courses are typically offered in undergraduate programs, there is growing evidence that high schools and pre-collegiate levels are also working to adopt data science courses, but the topic is still in its early stages and under development (Fernandez et al., 2024). The goal is to raise awareness among students about the vast amounts of data available on the internet, which can be analyzed to find solutions to a wide range of societal problems.
Ethics, competencies, contents, assessment, and other elements are part of the data science profession. In this context, it is important to note that data science professionals are in high demand in today’s society. These topics are addressed in the EDISON framework (Demchenko et al., 2021a), a framework that defines data science as a profession and that has been applied in other studies that deal with course design or curricular approaches (Wiktorski et al., 2017).
Figure 13 depicts some of the analytical and knowledge skills in data science education that were identified by Bonnell et al. (2022). These skills involve the management of problems and tasks to analyze data, interpret results, provide solutions, and show results.
The growing demand for data science in research and industry has increased the need for professionals in the field. However, there is a significant gap between this demand and the training of data science experts. To address this issue, strategies such as massive open online courses (e.g., Coursera), online platforms (e.g., LinkedIn Learning), and degree/certificate programs are being implemented. More and more universities and educational institutions, including colleges, are making efforts to offer programs that incorporate data science components.
One of the main concerns, which remains an open issue, is the ongoing discussion about the pedagogical aspects of training, learning scenarios, and curricula focused on data science education. From the content analysis, it was possible to identify some of the teaching/learning strategies adopted in data science education that involve active learning, experiential learning, project-based learning, computational thinking, training, gamification, and design thinking. In this context, Williams et al. (2021) emphasized the importance of collaboration in data science practices and argued that both substantive and methodological courses can help students understand the social, economic, and political aspects that impact real-world data. Similarly, G. Shao et al. (2021) highlighted the crucial role that libraries play as a strategy to enhance data science courses. Msweli et al. (2023) found that the key teaching strategies used in data science education are the following: competency-based learning, the use of technology, and teacher-led and project-based learning. However, other popular strategies, such as flipped classrooms and inquiry-based learning, are not popular and can be further explored to identify their affordances for the field.
An important aspect of a data science curriculum is its multidisciplinary nature (Demchenko et al., 2017), which suggests the need for curricula that are not solely focused on computer or information sciences and the need for a wide variety of teaching strategies that can be aligned with different disciplines (Msweli et al., 2023). In that regard, future data science curricula should also consider the diversity of students’ backgrounds, preferences, motivations, and interests. Supporting this idea, Msweli et al. (2023) suggest three challenges associated with teaching/learning strategies and curricula in data science education: (1) the lack of multidisciplinary teaching strategies, (2) the need to teach diverse audiences, and (3) the lack of standardization and accreditation in data science curricula.

5.5. RQ5: What Are the Core Courses, Skills, and Concepts in the Data Science Education Field?

The results of the analysis conducted to identify key courses, skills, and concepts in the data science education field revealed important concepts from the word clouds generated. An important finding was that some authors have addressed the challenge of identifying the current content, lessons, structures, and other pedagogical aspects of existing data science courses (Lee & Delaney, 2022). One of the articles reviewed presented a comprehensive curriculum that outlined a set of constructs guiding the specific topics included (Shea et al., 2019). The constructs are domain, ethics, theory, technical, analytics, and dissemination (this one is understood from the perspective of visualization).
Regarding the topics, an important finding was that some authors include a dedicated module on ethical issues, highlighting the importance of raising awareness about the data being manipulated and shared in various formats (Friedman, 2019; Lee & Delaney, 2022; Schwab-McCoy et al., 2021; G. Shao et al., 2021; Shea et al., 2019; Zhang et al., 2022).
It is also noteworthy that all the articles refer to at least one skill related to the data science field. Some authors emphasize the importance of data science skills as an inherent component of curriculum design. These skills encompass the ability to manage various components, such as data, tools, domain knowledge, and personal competencies. Skills are integrated into the curriculum to ensure that professionals are well prepared to address real-world societal problems and meet the demands of the job market. For example, programming and statistical skills are highly valued by potential employers in the data science field (Guzman et al., 2019). Furthermore, the combination of analytical and soft skills is crucial due to the transdisciplinary nature of the professional demands of future data scientists (Christozov et al., 2019).

6. Conclusions

After conducting the bibliometric analysis and scoping review, we can conclude that there is a growing interest in promoting data science education programs, with curricula being analyzed and adapted in response to societal demands and the evolution of data-driven technologies. The analysis of various studies allowed us to highlight key contributions in terms of time, countries, affiliations, publication sources, authors, and content. It also provided insights into the specific areas of focus addressed by some of the studies. In this regard, this study paves the way for more focused reviews, such as systematic reviews or surveys, centered on the trending topics identified in this paper. These reviews can further explore research challenges and opportunities in the field of data science education. The results reveal an increasing interest in the field in recent years, presenting opportunities for further research on the diverse topics within data science education.
According to several authors, statistics and mathematics should be integrated into data science curricula due to their analytical nature and the techniques they provide to ease the process of data analysis. Additionally, collaboration, as well as multidisciplinary and interdisciplinary approaches, should be considered when selecting learning activities for the data science education curriculum (Asamoah et al., 2020; Chao et al., 2020; Huppenkothen et al., 2018; Mike et al., 2020).
Emerging topics such as the semantic web, ontologies, linked data, and other specific areas from the field of artificial intelligence are being integrated into data science education curricula. These topics help identify techniques for data analysis and serve as strategies to enhance the curriculum itself. They are combined with foundational topics commonly included in the curriculum, such as the basics of data science, statistical and data mining techniques, databases, visualization, and ethical considerations, among others.
It is also important to incorporate best practices for open data into the data science curriculum, as a means to share scientific knowledge and benefit others through the widespread availability of data. This issue is also addressed from an ethical perspective, which is increasingly being considered in all data science program curricula. Data science can maximize its impact by promoting reproducibility through strategies such as sharing datasets, open-sourcing code, and making teaching materials accessible (Gymrek & Farjoun, 2016).
There are still many unanswered questions about data science education and how curricula can be designed effectively to develop the essential skills, competencies, and abilities in future data science professionals.

Author Contributions

Conceptualization, C.A.-G. and J.B.-A.; methodology, C.A.-G.; formal analysis, C.A.-G. and J.B.-A.; writing—original draft preparation, C.A.-G.; writing—review and editing, C.A.-G. and J.B.-A.; All authors have read and agreed to the published version of the manuscript.

Funding

This research was fund by Fundación Universitaria Konrad Lorenz, Grant number 5INV6201.

Data Availability Statement

The original data presented in the study are openly available in the OSF repository at https://osf.io/uf5pc/?view_only=efee3a88bc4f4768b160c98cda11e9e4.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Aria, M., & Cuccurullo, C. (2017). bibliometrix: An R-tool for comprehensive science mapping analysis. Journal of Informetrics, 11(4), 959–975. [Google Scholar] [CrossRef]
  2. Asamoah, D. A., Doran, D., & Schiller, S. (2020). Interdisciplinarity in data science pedagogy: A foundational design. Journal of Computer Information Systems, 60(4). Available online: https://recursosvirtuales.konradlorenz.edu.co:2418/doi/abs/10.1080/08874417.2018.1496803 (accessed on 5 November 2024).
  3. Azevedo, A., & Azevedo, J. M. (2021). Learning analytics: A bibliometric analysis of the literature over the last decade. International Journal of Educational Research Open, 2, 100084. [Google Scholar] [CrossRef]
  4. Baako, T.-M. D., Kulkarni, S. K., McClendon, J. L., Harcum, S. W., & Gilmore, J. (2024). Machine learning and deep learning strategies for Chinese hamster ovary cell bioprocess optimization. Fermentation, 10, 234. [Google Scholar] [CrossRef]
  5. Bai, L., & Hu, Y. (2018, May 19–20). Problem-driven teaching activities for the capstone project course of data science. ACM Turing Celebration Conference—China (pp. 130–131), Shanghai, China. [Google Scholar] [CrossRef]
  6. Berikan, B., & Özdemir, S. (2020). Investigating “problem-solving with datasets” as an implementation of computational thinking: A literature review. Journal of Educational Computing Research, 58(2), 502–534. [Google Scholar] [CrossRef]
  7. Bile Hassan, I., Ghanem, T., Jacobson, D., Jin, S., Johnson, K., Sulieman, D., & Wei, W. (2021). data science curriculum design: A case study. In SIGCSE 2021—Proceedings of the 52nd ACM technical symposium on computer science education (pp. 529–534). Association for Computing Machinery, Inc. [Google Scholar] [CrossRef]
  8. Boaler, J., Conte, K., Cor, K., Dieckmann, J. A., LaMar, T., Ramirez, J., & Selbach-Allen, M. (2024). Studying the opportunities provided by an applied high school mathematics course: Explorations in data science. Journal of Statistics and Data Science Education, 33, 26–45. [Google Scholar] [CrossRef]
  9. Bonnell, J., Ogihara, M., & Yesha, Y. (2022). Challenges and Issues in Data Science Education. Computer, 55(2), 63–66. [Google Scholar] [CrossRef]
  10. Boztaş, G. D., Berigel, M., & Altınay, F. (2024). A bibliometric analysis of educational data mining studies in global perspective. Education and Information Technologies, 29(7), 8961–8985. [Google Scholar] [CrossRef]
  11. Burr, W., Chevalier, F., Collins, C., Gibbs, A. L., Ng, R., & Wild, C. J. (2021). Computational skills by stealth in introductory data science teaching. Teaching Statistics, 43(S1), S34–S51. [Google Scholar] [CrossRef]
  12. Chao, L., Xing, C., Zhang, Y., & Zhang, C. (2020). Data science: State of the art and trends. Data Science and Informetrics, 1(1), 22–49. [Google Scholar]
  13. Christozov, D. G., Rasheva-Yordanova, K., & Toleva-Stoimenova, S. (2019). Challenges in designing curriculum for trans-disciplinary education: On cases of designing concentration on informing science and master program on data science. Informing Science: The International Journal of an Emerging Transdiscipline, 22, 19–30. [Google Scholar] [CrossRef]
  14. Çetinkaya-Rundel, M., & Rundel, C. (2018). Infrastructure and tools for teaching computing throughout the statistical curriculum. The American Statistician. Available online: https://recursosvirtuales.konradlorenz.edu.co:2418/doi/abs/10.1080/00031305.2017.1397549 (accessed on 10 November 2024).
  15. Demchenko, Y., Belloum, A., de Laat, C., Loomis, C., Wiktorski, T., & Spekschoor, E. (2017, December 11–14). Customisable data science educational environment: From competences management and curriculum design to virtual labs on-demand. 2017 IEEE International Conference on Cloud Computing Technology and Science (CloudCom) (pp. 363–368), Hong Kong. [Google Scholar] [CrossRef]
  16. Demchenko, Y., Comminiello, L., & Reali, G. (2019, March 27–29). Designing customisable data science curriculum using ontology for data science competences and body of knowledge. 2019 International Conference on Big Data and Education (pp. 124–128), London, UK. [Google Scholar] [CrossRef]
  17. Demchenko, Y., Gruengard, E., & Klous, S. (2014, December 15–18). Instructional model for building effective big data curricula for online and campus education. 2014 IEEE 6th International Conference on Cloud Computing Technology and Science (pp. 935–941), Singapore. [Google Scholar] [CrossRef]
  18. Demchenko, Y., José, C. G. J., Brewer, S., & Wiktorski, T. (2021a, April 21–23). EDISON data science framework (EDSF): Addressing demand for data science and analytics competences for the data driven digital economy. 2021 IEEE Global Engineering Education Conference (EDUCON) (pp. 1682–1687), Vienna, Austria. [Google Scholar] [CrossRef]
  19. Demchenko, Y., Maijer, M., & Comminiello, L. (2021b, February 3–5). Data scientist professional revisited: Competences definition and assessment, curriculum and education path design. 2021 4th International Conference on Big Data and Education (pp. 52–62), London, UK. [Google Scholar] [CrossRef]
  20. De Veaux, R., Agarwal, M., Averett, M., Baumer, B., Bray, A., Bressoud, T., Bryant, L., Cheng, L., Francis, A., Gould, R., Kim, A., Kretchmar, R., Lu, Q., Moskol, A., Nolan, D., Pelayo, R., Raleigh, S., Sethi, R., Sondjaja, M., . . . Ye, P. (2017). Curriculum guidelines for undergraduate programs in data science. Annual Review of Statistics and Its Application, 4. [Google Scholar] [CrossRef]
  21. Dogucu, M., Demirci, S., Bendekgey, H., Ricci, F. Z., & Medina, C. M. (2024). A systematic literature review of undergraduate data science education research. Available online: https://www.semanticscholar.org/paper/A-Systematic-Literature-Review-of-Undergraduate-Dogucu-Demirci/3f9629641456ccfc66b0afb58206c0e2b692609b (accessed on 2 December 2024).
  22. Dogucu, M., Johnson, A. A., & Ott, M. (2023). Framework for accessible and inclusive teaching materials for statistics and data science courses. Journal of Statistics and Data Science Education, 31, 144–150. [Google Scholar] [CrossRef]
  23. Donoghue, T., Voytek, B., & Ellis, S. E. (2021). Teaching creative and practical data science at scale. Journal of Statistics and Data Science Education. Available online: https://recursosvirtuales.konradlorenz.edu.co:2418/doi/abs/10.1080/10691898.2020.1860725 (accessed on 5 December 2024). [CrossRef]
  24. Drummond, D. E. (2016, October 12–15). Open sourcing education for data engineering and data science. 2016 IEEE Frontiers in Education Conference (FIE) (p. 1), Eire, PA, USA. [Google Scholar] [CrossRef]
  25. Echeverria, F., Kao, Y., & Hubbard Cheuoua, A. (2023, March 15–18). Using student and teacher feedback to modify CS curriculum. SIGCSE 2023—Proceedings of the 54th ACM Technical Symposium on Computer Science Education (Vol. 2, p. 1420), Toronto, ON, Canada. [Google Scholar] [CrossRef]
  26. Echeverria, V., Martinez-Maldonado, R., & Buckingham Shum, S. (2017, November 28–December 1). Towards data storytelling to support teaching and learning. 29th Australian Conference on Computer-Human Interaction (pp. 347–351), Brisbane, QLD, Australia. [Google Scholar] [CrossRef]
  27. Ellegaard, O., & Wallin, J. A. (2015). The bibliometric analysis of scholarly production: How great is the impact? Scientometrics, 105(3), 1809–1831. [Google Scholar] [CrossRef]
  28. Fernandez, C., Freitas, J., Blikstein, P., & de Deus Lopes, R. (2024). The design space of visualization tools for data science education: Literature review and framework for future designs. International Journal of Child-Computer Interaction, 100698. [Google Scholar] [CrossRef]
  29. Fitzgerald, B. K., Barkanic, S., Cárdenas-Navia, I., Chen, J., Elzey, K., Hughes, D., & Troyan, D. (2016). The BHEF national higher education and workforce initiative: A model for pathways to baccalaureate attainment and high-skill careers in emerging fields, Part 3. Industry and Higher Education, 30(6), 433–439. [Google Scholar] [CrossRef]
  30. Friedman, A. (2019). Data science syllabi measuring its content. Education and Information Technologies, 24(6), 3467–3481. [Google Scholar] [CrossRef]
  31. Frischemeier, D., Biehler, R., Podworny, S., & Budde, L. (2021). A first introduction to data science education in secondary schools: Teaching and learning about data exploration with CODAP using survey data. Teaching Statistics, 43(S1), S182–S189. [Google Scholar] [CrossRef]
  32. Golubski, C. (2016, October 12–15). Using inquiry-based learning in engineering statistics courses. 2016 IEEE Frontiers in Education Conference (FIE) (pp. 1–3), Erie, PA, USA. [Google Scholar] [CrossRef]
  33. Guzman, L. M., Pennell, M. W., Nikelski, E., & Srivastava, D. S. (2019). Successful integration of data science in undergraduate biostatistics courses using cognitive load theory. CBE—Life Sciences Education, 18(4), ar49. [Google Scholar] [CrossRef] [PubMed]
  34. Gymrek, M., & Farjoun, Y. (2016). Recommendations for open data science. GigaScience, 5(1), s13742-016-0127-4. [Google Scholar] [CrossRef]
  35. Hagen, L. (2020). Teaching undergraduate data science for information schools. Education for Information, 36(2), 109–117. [Google Scholar] [CrossRef]
  36. Hardin, J., Hoerl, R., Horton, N. J., Nolan, D., Baumer, B., Hall-Holt, O., Murrell, P., Peng, R., Roback, P., Temple Lang, D., & Ward, M. D. (2015). Data science in statistics curricula: Preparing students to “think with data”. The American Statistician, 69(4), 343–353. [Google Scholar] [CrossRef]
  37. Hazzan, O., & Mike, K. (2020). Ten challenges of data science education. Communications of the ACM. Available online: https://cacm.acm.org/blogcacm/ten-challenges-of-data-science-education/ (accessed on 3 October 2024).
  38. Hazzan, O., & Mike, K. (2021). A journal for interdisciplinary data science education. Communications of the ACM, 64(8), 10–11. [Google Scholar] [CrossRef]
  39. Heinemann, B., Opel, S., Budde, L., Schulte, C., Frischemeier, D., Biehler, R., Podworny, S., & Wassong, T. (2018, November 22–25). Drafting a data science curriculum for secondary schools. 18th Koli Calling International Conference on Computing Education Research (pp. 1–5), Koli, Finland. [Google Scholar] [CrossRef]
  40. Hicks, S. C., & Irizarry, R. A. (2018). A guide to teaching data science. The American Statistician, 72(4), 382–391. [Google Scholar] [CrossRef] [PubMed]
  41. Horton, N. J. (2022). 30 Years of the journal of statistics and data science education. Journal of Statistics and Data Science Education, 30(1), 1–2. [Google Scholar] [CrossRef]
  42. Horton, N. J., & Hardin, J. S. (2021). Integrating computing in the statistics and data science curriculum: Creative structures, novel skills and habits, and ways to teach computational thinking. Journal of Statistics and Data Science Education, 29, S1–S3. [Google Scholar] [CrossRef]
  43. Hoyt, R., & Wangia-Anderson, V. (2018). An overview of two open interactive computing environments useful for data science education. JAMIA Open, 1(2), 159–165. [Google Scholar] [CrossRef]
  44. Hsu, Y.-C. (2024). Mapping the landscape of data science education in higher general education in taiwan: A comprehensive syllabi analysis. Education Sciences, 14, 763. [Google Scholar] [CrossRef]
  45. Huppenkothen, D., Arendt, A., Hogg, D. W., Ram, K., VanderPlas, J. T., & Rokem, A. (2018). Hack weeks as a model for data science education and collaboration. Proceedings of the National Academy of Sciences USA, 115(36), 8872–8877. [Google Scholar] [CrossRef] [PubMed]
  46. Kaplan, D. (2018). Teaching Stats for Data Science. The American Statistician, 72(1), 89–96. [Google Scholar] [CrossRef]
  47. Kauermann, G., & Seidl, T. (2018). Data science: A proposal for a curriculum. International Journal of Data Science and Analytics, 6(3), 195–199. [Google Scholar] [CrossRef]
  48. Kenett, R. S., & Shmueli, G. (2016). Integrating InfoQ into data science analytics programs, research methods courses, and more. Wiley Data and Cybersecurity. [Google Scholar] [CrossRef]
  49. Ki Kim, S., Kim, T., & Kim, K. (2023, February 20–23). Analysis of teaching and learning environment for data science and AI education (focused on 2022 revised curriculum). 5th International Conference on Artificial Intelligence in Information and Communication, ICAIIC 2023 (pp. 788–790), Bali, Indonesia. [Google Scholar] [CrossRef]
  50. Klašnja-Milićević, A., Ivanović, M., & Budimac, Z. (2017). Data science in education: Big data and learning analytics. Computer Applications in Engineering Education, 25(6), 1066–1078. [Google Scholar] [CrossRef]
  51. Kloefkorn, T., Boardman, M., Horton, N. J., & Marshall, B. (2020). National academies’ roundtable on data science postsecondary education. In Proceedings of the 51st ACM technical symposium on computer science education (pp. 956–957). Association for Computing Machinery. [Google Scholar] [CrossRef]
  52. Kruskal, J. B., Berkowitz, S., Geis, J. R., Kim, W., Nagy, P., & Dreyer, K. (2017). Big data and machine learning—Strategies for driving this bus: A summary of the 2016 intersociety summer conference. Journal of the American College of Radiology, 14, 811–817. [Google Scholar] [CrossRef]
  53. Labou, S., Yoo, H. J., Minor, D., & Altintas, I. (2019, September 24–27). Sharing and archiving data science course projects to support pedagogy for future cohorts. 2019 15th International Conference on eScience (eScience) (pp. 644–645), San Diego, CA, USA. [Google Scholar] [CrossRef]
  54. Lee, V. R., & Delaney, V. (2022). Identifying the content, lesson structure, and data use within pre-collegiate data science curricula. Journal of Science Education and Technology, 31(1), 81–98. [Google Scholar] [CrossRef]
  55. Lewis, A., & Stoyanovich, J. (2021). Teaching responsible data science: Charting new pedagogical territory. International Journal of Artificial Intelligence in Education, 32, 783–807. [Google Scholar] [CrossRef] [PubMed]
  56. Li, X., Fan, X., Qu, X., Sun, G., Yang, C., Zuo, B., & Liao, Z. (2019). Curriculum reform in big data education at applied technical colleges and universities in China. IEEE Access, 7, 125511–125521. [Google Scholar] [CrossRef]
  57. Lilan, C., & Zhong, J. (2024). Intelligent recommendation system for College English courses based on graph convolutional networks. Heliyon, 10, e29052. [Google Scholar] [CrossRef] [PubMed]
  58. Lin, L., Zhou, D., Wang, J., & Wang, Y. (2024). A systematic review of big data driven education evaluation. Sage Open, 14(2), 21582440241242180. [Google Scholar] [CrossRef]
  59. Luo, Y., Han, X., & Zhang, C. (2024). Prediction of learning outcomes with a machine learning algorithm based on online learning behavior data in blended courses. Asia Pacific Education Review, 25, 267–285. [Google Scholar] [CrossRef]
  60. Marín-Marín, J.-A., López-Belmonte, J., Fernández-Campoy, J.-M., & Romero-Rodríguez, J.-M. (2019). Big data in education. A bibliometric review. Social Sciences, 8(8), 223. [Google Scholar] [CrossRef]
  61. Memarian, B., & Doleck, T. (2024). Data science pedagogical tools and practices: A systematic literature review. Education and Information Technologies, 29(7), 8179–8201. [Google Scholar] [CrossRef]
  62. Merryman, L., & Lu, S. (2021). Are fashion majors ready for the era of data science? A study on the fashion undergraduate curriculums in U.S. institutions. International Journal of Fashion Design, Technology and Education. Available online: https://recursosvirtuales.konradlorenz.edu.co:2418/doi/abs/10.1080/17543266.2021.1884752 (accessed on 5 December 2024). [CrossRef]
  63. Mike, K. (2020). Data Science Education: Curriculum and pedagogy. In Proceedings of the 2020 ACM conference on international computing education research (pp. 324–325). Association for Computing Machinery. [Google Scholar] [CrossRef]
  64. Mike, K., Hazan, T., & Hazzan, O. (2020, November 19–22). Equalizing data science curriculum for computer science pupils. Koli Calling ’20: Proceedings of the 20th Koli Calling International Conference on Computing Education Research (pp. 1–5), Koli, Finland. [Google Scholar] [CrossRef]
  65. Mike, K., Kimelfeld, B., & Hazzan, O. (2023). The birth of a new discipline: Data science education. Harvard Data Science Review, 5(4). [Google Scholar] [CrossRef]
  66. Mikroyannidis, A., Domingue, J., Phethean, C., Beeston, G., & Simperl, E. (2018). Designing and delivering a curriculum for data science education across Europe. In Teaching and learning in a digital world (pp. 540–550). Springer. [Google Scholar] [CrossRef]
  67. Msweli, N. T., Mawela, T., & Twinomurinzi, H. (2023). Data science education—A scoping review. Journal of Information Technology Education: Research, 22, 263–294. [Google Scholar] [CrossRef]
  68. Oliveira, O. J., de Silva, F. F., da Juliani, F., Barbosa, L. C. F. M., & Nunhes, T. V. (2019). Bibliometric method for mapping the state-of-the-art and identifying research gaps and trends in literature: An essential instrument to support the development of scientific projects. In Scientometrics recent advances. IntechOpen. [Google Scholar] [CrossRef]
  69. Perron, B. E., Victor, B. G., Hiltz, B. S., & Ryan, J. (2020). Teaching note—Data science in the msw curriculum: Innovating training in statistics and research methods. Journal of Social Work Education, 58(1), 193–198. [Google Scholar] [CrossRef]
  70. Raban, D., & Gordon, A. (2020). The evolution of data science and big data research: A bibliometric analysis. Scientometrics, 122(3), 1563–1581. [Google Scholar] [CrossRef]
  71. Raman, A., Thannimalai, R., Don, Y., & Rathakrishnan, M. (2021). A bibliometric analysis of blended learning in higher education: Perception, achievement and engagement. International Journal of Learning, Teaching and Educational Research, 20(6), 126–151. [Google Scholar] [CrossRef]
  72. Rampure, S., Shen, A., & Hug, J. (2021, March 17–21). Experiences teaching a large upper-division data science course remotely. 52nd ACM Technical Symposium on Computer Science Education (pp. 523–528), Toronto, ON, Canada. [Google Scholar] [CrossRef]
  73. Rao, Y. S. N., & Chen, C. J. (2024). Bibliometric insights into data mining in education research: A decade in review. Contemporary Educational Technology, 16(2), ep502. [Google Scholar] [CrossRef]
  74. Sakamaki, K., Taguri, M., Nishiuchi, H., Akimoto, Y., & Koizumi, K. (2022). Experience of distance education for project-based learning in data science. Japanese Journal of Statistics and Data Science, 5, 757–767. [Google Scholar] [CrossRef]
  75. Salas-Rueda, R.-A. (2021). Analysis of facebook in the teaching-learning process about mathematics through data science. Canadian Journal of Learning and Technology, 47(2). [Google Scholar] [CrossRef]
  76. Salas-Rueda, R.-A., Eslava-Cervantes, A.-L., & Prieto-Larios, E. (2020). Teachers’ perceptions about the impact of moodle in the educational field considering data science. Online Journal of Communication and Media Technologies, 10(4), e202023. [Google Scholar] [CrossRef]
  77. Saltz, J. S., Dewar, N. I., & Heckman, R. (2018, March 8–11). Key concepts for a data science ethics curriculum. 49th ACM Technical Symposium on Computer Science Education (pp. 952–957), Seattle, WA, USA. [Google Scholar] [CrossRef]
  78. Samsul, S. A., Yahaya, N., & Abuhassna, H. (2023). Education big data and learning analytics: A bibliometric analysis. Humanities and Social Sciences Communications, 10(1), 1–11. [Google Scholar] [CrossRef]
  79. Schwab-McCoy, A., Baker, C. M., & Gasper, R. E. (2021). Data science in 2020: Computing, curricula, and challenges for the next 10 years. Journal of Statistics and Data Science Education, 29, S40–S50. [Google Scholar] [CrossRef]
  80. Scopus. (n.d.a). Elsevier scopus blog. Available online: https://blog.scopus.com/about (accessed on 9 January 2025).
  81. Scopus. (n.d.b). Scopus|abstract and citation database|Elsevier. Available online: https://www.elsevier.com/products/scopus (accessed on 9 January 2025).
  82. Scopus. (2024). Scopus content|Elsevier. Www.Elsevier.Com. Available online: https://www.elsevier.com/products/scopus/content (accessed on 9 January 2025).
  83. Shao, G., Quintana, J. P., Zakharov, W., Purzer, S., & Kim, E. (2021). Exploring potential roles of academic libraries in undergraduate data science education curriculum development. The Journal of Academic Librarianship, 47(2), 102320. [Google Scholar] [CrossRef]
  84. Shao, Z., Yuan, S., Jin, Y., & Wang, Y. (2024). Scholar’s career switch from academia to industry: Mining and analysis from aminer. Big Data Research, 36, 100441. [Google Scholar] [CrossRef]
  85. Shea, K. D., Brewer, B. B., Carrington, J. M., Davis, M., Gephart, S., & Rosenfeld, A. (2019). A model to evaluate data science in nursing doctoral curricula. Nursing Outlook, 67(1), 39–48. [Google Scholar] [CrossRef]
  86. Sun, W., Ding, Y., Wang, R., Liu, Y., Wang, Y., Zhu, B., & Liu, Q. (2024). Bibliometric analysis of assessment and evaluation in higher education: 2012–2023. Assessment & Evaluation in Higher Education, 49(8), 1121–1135. [Google Scholar] [CrossRef]
  87. Tamai, T., Okamoto, K., Iuchi, K., & Kawada, K. (2021). Development of teaching material to design a vehicle on data science in junior high school technology education. IEEJ Transactions on Electrical and Electronic Engineering, 16(10), 1407–1413. [Google Scholar] [CrossRef]
  88. Tobar, F., Bravo-Marquez, F., Dunstan, J., Fontbona, J., Maass, A., Remenik, D., & Silva, J. F. (2021). Data science for engineers: A teaching ecosystem. IEEE Signal Processing Magazine, 38(3), 144–153. [Google Scholar] [CrossRef]
  89. Tolsgaard, M. G., Boscardin, C. K., Park, Y. S., Cuddy, M. M., & Sebok-Syer, S. S. (2020). The role of data science and machine learning in Health Professions Education: Practical applications, theoretical contributions, and epistemic beliefs. Advances in Health Sciences Education, 25, 1057–1086. [Google Scholar] [CrossRef] [PubMed]
  90. van Eck, N. J., & Waltman, L. (2010). Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics, 84(2), 523–538. [Google Scholar] [CrossRef] [PubMed]
  91. Visser, M., van Eck, N. J., & Waltman, L. (2021). Large-scale comparison of bibliographic data sources: Scopus, web of science, dimensions, crossref, and microsoft academic. Quantitative Science Studies, 2(1), 20–41. [Google Scholar] [CrossRef]
  92. Walker, R. E. (2024, March 10–13). Mapping curricula to skills and occupations using course descriptions. EDUNINE 2024—8th IEEE World Engineering Education Conference: Empowering Engineering Education: Breaking Barriers Through Research and Innovation, Guatemala City, Guatemala. [Google Scholar] [CrossRef]
  93. West, J. (2018). Teaching data science: An objective approach to curriculum validation. Computer Science Education, 28(2), 136–157. [Google Scholar] [CrossRef]
  94. Wiktorski, T., Demchenko, Y., & Belloum, A. (2017, December 11–14). Model curricula for data science EDISON data science framework. 2017 IEEE International Conference on Cloud Computing Technology and Science (CloudCom) (pp. 369–374), Hong Kong. [Google Scholar] [CrossRef]
  95. Williams, U., Brown, R., Davis, M., Pavri, T., & Shafiei, F. (2021). Teaching data science in political science: Integrating methods with substantive curriculum. PS: Political Science & Politics, 54(2), 336–339. [Google Scholar] [CrossRef]
  96. Williamson, B. (2015). Governing methods: Policy innovation labs, design and data science in the digital governance of education. Journal of Educational Administration and History, 47(3), 251–271. [Google Scholar] [CrossRef]
  97. Zhang, Y., Wu, D., Hagen, L., Song, I.-Y., Mostafa, J., Oh, S., Anderson, T., Shah, C., Bishop, B. W., Hopfgartner, F., Eckert, K., Federer, L., & Saltz, J. S. (2022). Data science curriculum in the iField. Journal of the Association for Information Science and Technology, 74, 641–662. [Google Scholar] [CrossRef] [PubMed]
  98. Zhang, Y., Zhang, T., Jia, Y., Sun, J., Xu, F., & Xu, W. (2017, May 20–28). DataLab: Introducing software engineering thinking into data science education at scale. 2017 IEEE/ACM 39th International Conference on Software Engineering: Software Engineering Education and Training Track (ICSE-SEET) (pp. 47–56), Buenos Aires, Argentina. [Google Scholar] [CrossRef]
Figure 1. Steps of the bibliometric analysis.
Figure 1. Steps of the bibliometric analysis.
Education 15 00186 g001
Figure 2. Publication research areas.
Figure 2. Publication research areas.
Education 15 00186 g002
Figure 3. Annual scientific growth.
Figure 3. Annual scientific growth.
Education 15 00186 g003
Figure 4. Top 20 country and region scientific production.
Figure 4. Top 20 country and region scientific production.
Education 15 00186 g004
Figure 5. Countries/regions’ collaboration network.
Figure 5. Countries/regions’ collaboration network.
Education 15 00186 g005
Figure 6. Top 20 most relevant affiliations.
Figure 6. Top 20 most relevant affiliations.
Education 15 00186 g006
Figure 7. Top 20 relevant publication sources.
Figure 7. Top 20 relevant publication sources.
Education 15 00186 g007
Figure 8. Top 20 authors’ production over time.
Figure 8. Top 20 authors’ production over time.
Education 15 00186 g008
Figure 9. Co-occurrence of authors’ keywords – Overlay visualization.
Figure 9. Co-occurrence of authors’ keywords – Overlay visualization.
Education 15 00186 g009
Figure 10. Co-occurrence of authors’ keywords Density visualization.
Figure 10. Co-occurrence of authors’ keywords Density visualization.
Education 15 00186 g010
Figure 11. Thematic evolution.
Figure 11. Thematic evolution.
Education 15 00186 g011
Figure 12. Thematic map.
Figure 12. Thematic map.
Education 15 00186 g012
Figure 13. Analytical and knowledge skills.
Figure 13. Analytical and knowledge skills.
Education 15 00186 g013
Figure 14. Word cloud with courses’ names in data science education.
Figure 14. Word cloud with courses’ names in data science education.
Education 15 00186 g014
Figure 15. Word cloud with data science education topics.
Figure 15. Word cloud with data science education topics.
Education 15 00186 g015
Figure 16. Word cloud with the names of skills.
Figure 16. Word cloud with the names of skills.
Education 15 00186 g016
Table 1. Main dataset information.
Table 1. Main dataset information.
DescriptionResults
Main information about data
Timespan2005:2024
Sources (journals, books, etc.)631
Documents1245
Average citations per document7.946
Average citations per year per doc2.99
References36,853
Document types
Article551
Book chapter47
Conference paper597
Review50
Document contents
Keywords Plus (ID)4821
Author’s keywords (DE)2886
Authors
Authors3980
Authors of single-authored documents205
Author collaboration
Single-authored documents227
Co-authors per document3.68
International co-authorships %16.31
Table 2. Three articles relevant to annual scientific growth.
Table 2. Three articles relevant to annual scientific growth.
Article CitationKeywordsTypeMain Topic
(Kruskal et al., 2017)ACR; big data; data science; deep learning; imaging informatics; Intersociety Committee; machine learning; radiologyDescriptive, summaryDiscussion about applications of machine learning for image analysis.
(Baako et al., 2024)biomanufacturing; bioprocess engineering; Chinese hamster ovary (CHO) cells; data science; deep learning; multivariate statistical analysis; recombinant protein productionReviewMachine learning and deep learning in bioprocessing.
(Mike, 2020)computer science education; data science educationConference paperPedagogical aspects of data science education.
Table 3. Three relevant articles in the classification per country.
Table 3. Three relevant articles in the classification per country.
CountryArticleKeywordsTypeMain Topic
United States(Kruskal et al., 2017)ACR; big data; data science; deep learning; imaging informatics; Intersociety Committee; machine learning; radiologyDescriptive, summaryDiscussion about applications of machine learning for image analysis
(Baako et al., 2024)biomanufacturing; bioprocess engineering; Chinese hamster ovary (CHO) cells; data science; deep learning; multivariate statistical analysis; recombinant protein productionReviewMachine learning and deep learning in bioprocessing
(F. Echeverria et al., 2023)CRISP-DM, data mining, data visualization, database, information technology education, introductory data scienceSurveyIntegrating data science into a general education information technology course
(Li et al., 2019)data science; data science education; middle schoolArticleStudent and teacher feedback to modify CS curriculum
China(Luo et al., 2024)Data science applications in education; Distributed learning environments; Evaluation methodologies; Interdisciplinary projects; Postsecondary educationArticle, data analysisUse of machine learning for predictions in education analyzing blended courses
(Z. Shao et al., 2024)Career mining; Data mining & analytics; Data science; Knowledge and technology transfer; Science of science; Scientific big dataReviewKnowledge and technology transfer and the research change of scholars
(Lilan & Zhong, 2024)Data science applications in education; Distance education and online learning; Human-computer interface; Learning communities; Teaching/learning strategiesArticle, analysis with neural networksRecommendation systems; a graph convolutional neural network model based on college English course texts, students’ major, English foundation, and network structure characteristics
United Kingdom(Mikroyannidis et al., 2018)Courseware; Curricula; Data science; Demand analysis; Personalised learning pathways; SkillsConference paperPresentation of the initiative entitled European Data Science Academy (EDSA) for training new generations of data scientists
(Dogucu et al., 2023)Accessibility; Curriculum; Inclusion; TextbooksArticle, descriptive Introducing a framework for developing accessible and inclusive course materials
(Demchenko et al., 2014)Andragogy; Big data architecture framework; Bloom’s taxonomy; Common body of knowledge; Education and training on big data technologies; Instructional methodology; Online educationConference paperDescription of topics for common body of knowledge for data science and big data technology domains
Table 4. Three relevant articles in the classification per affiliation.
Table 4. Three relevant articles in the classification per affiliation.
ArticleKeywordsTypeMain Topic
(Lee & Delaney, 2022)Curriculum analysis; Data literacy; Data science education; Data science lessons; Secondary school; Statistics educationArticle, analysis of data science curriculaAnalysis of curricula and professional practice
(Boaler et al., 2024)Data science education; Math pathways; Mixed methodsArticle, analysis of data science courseAnalysis of a data science course in a high school for identifying who take more mathematics courses focusing on STEM
(Tolsgaard et al., 2020)Artificial intelligence; data science; Machine learning; Medical education research; Research in Health Professions EducationArticle, critical reviewAnalysis of what roles both data science and machine learning play in health professions
Table 5. Information from the top 20 authors.
Table 5. Information from the top 20 authors.
AuthorName in PublicationsORCIDAffiliation Region/Country
1SALAS-RUEDA RARicardo-Adán Salas-Rueda0000-0002-4188-4610Instituto de Ciencias Aplicadas y Tecnología, Universidad Nacional Autónoma de México Mexico
2DEMCHENCKO YYuri Demchenko0000-0001-7474-9506University of AmsterdamNoord-Holland, Amsterdam
3MIKE KKoby Mike0000-0002-0977-9845Department of Education in Science and Technology, TechnionHaifa, Israel
4HAZZAN OOrit Hazzan0000-0002-8627-0997Department of Education in Science and Technology, TechnionHaifa, Israel
5ALVARADO-ZAMORANO CClara Alvarado-Zamorano0000-0001-9122-7590Instituto de Ciencias Aplicadas y Tecnología, Universidad Nacional Autónoma de MéxicoMexico
6WILLIAMSON BBen Williamson0000-0001-9356-3213Centre for Research in Digital Education, School of Education, University of EdinburghEdinburgh
7BIEHLER RRolf Biehler0000-0002-9815-1282Paderborn UniversityPaderborn, Deutschland
8CAR JJosip Car0000-0001-8969-371XNanyang Technological University
Imperial College London
Singapore
Westminster, London, the United Kingdom
9GUO PJPhilip J. GuoNo information or no public ORCID profileUniversity of CaliforniaSan Diego, the United States
10HORTON NJNicholas J. Horton0000-0003-3332-4311Department of Mathematics and Statistics, Amherst CollegeAmherst, the United States
11LEE VRVictor R. Lee0000-0001-6434-7589Stanford UniversityStanford, the United States
12RAJ RKRajendra K. Raj0000-0003-2378-1068Rochester Institute of TechnologyRochester, NY, the United States
13DAVIS KCKaren C. Davis0000-0003-2327-4429Computer Science and Software Engineering Department, Miami UniversityOxford, the United States
14DE-LA-CRUZ-MARTÍNEZ GGustavo De-La-Cruz-Martínez0000-0002-4446-7396Instituto de Ciencias Aplicadas y Tecnología, Universidad Nacional Autónoma de MéxicoMexico
15DOGUCU MMine Dogucu0000-0002-8007-934XUniversity of CaliforniaIrvine, California, the United States
16MEINERT EEdward Meinert0000-0003-2484-3347University of Plymouth
Newcastle University
Plymouth, the United Kingdom
Newcastle, the United Kingdom
17PFANNKUCH MMaxine Pfannkuch0000-0002-2202-9678The University of AucklandAuckland, New Zealand
18SAKR MMajd Sakr0000-0001-5150-8259Carnegie Mellon UniversityPittsburgh, the United States
19WU WWensheng Wu0000-0002-2948-9773Computer Science Department, University of Southern CaliforniaLos Angeles, the United States
20ADAMS JJoshua Adams0000-0002-7185-9125Saint Leo UniversitySaint Leo, the United States
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Avila-Garzon, C.; Bacca-Acosta, J. Curriculum, Pedagogy, and Teaching/Learning Strategies in Data Science Education. Educ. Sci. 2025, 15, 186. https://doi.org/10.3390/educsci15020186

AMA Style

Avila-Garzon C, Bacca-Acosta J. Curriculum, Pedagogy, and Teaching/Learning Strategies in Data Science Education. Education Sciences. 2025; 15(2):186. https://doi.org/10.3390/educsci15020186

Chicago/Turabian Style

Avila-Garzon, Cecilia, and Jorge Bacca-Acosta. 2025. "Curriculum, Pedagogy, and Teaching/Learning Strategies in Data Science Education" Education Sciences 15, no. 2: 186. https://doi.org/10.3390/educsci15020186

APA Style

Avila-Garzon, C., & Bacca-Acosta, J. (2025). Curriculum, Pedagogy, and Teaching/Learning Strategies in Data Science Education. Education Sciences, 15(2), 186. https://doi.org/10.3390/educsci15020186

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop