You are currently viewing a new version of our website. To view the old version click .
Applied Sciences
  • Article
  • Open Access

5 September 2023

Analyzing the Alignment between AI Curriculum and AI Textbooks through Text Mining

,
and
1
Department of Computer Science and Engineering, Graduate School, Korea University, Seoul 02841, Republic of Korea
2
Major of Computer Science Education, Graduate School of Education, Korea University, Seoul 02841, Republic of Korea
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue ICTs in Education

Abstract

The field of artificial intelligence (AI) is permeating education worldwide, reflecting societal changes driven by advancements in computing technology and the data revolution. Herein, we analyze the alignment between core AI educational curricula and textbooks to provide guidance on structuring AI knowledge. Text mining techniques using Python 3.10.3 and frame-based content analysis tailored to the computing field are employed to examine a substantial amount of text data within educational curriculum textbooks. We comprehensively examine the frequency of knowledge incorporated in AI curricula, topic structure, and practical tool utilization. The degree to which keywords are reflected in curriculum textbooks and in the textbook characteristics are determined using Term Frequency (TF) and Term Frequency-Inverse Document Frequency (TF-IDF) analysis, respectively. The topic structure distribution is derived by Latent Dirichlet Allocation (LDA) topic modeling and the trained model is visualized using PyLDAvis. Furthermore, the variation in vertical content range or level is investigated by content analysis, considering the tools used to teach similar AI knowledge. Lastly, the implications for AI curriculum structure are discussed in terms of curriculum composition, knowledge construction, practical application, and curriculum utilization. This study provides practical guidance for structuring curricula that effectively foster AI competency based on a systematic research methodology.

1. Introduction

In the era of digital transformation, computational thinking, artificial intelligence (AI) knowledge, and digital literacy are recognized as core competencies for students and professionals. Informatics education plays a crucial role in strengthening basic competencies in the computing field and cultivating essential individual capabilities in the modern world [1,2,3].
As a result of the social changes induced by the data revolution and the advances in computing technology, the AI field, in particular, has been increasingly incorporated into the educational systems of several countries worldwide. AI traces its roots back to the 1950s but has truly flourished over the past 13 years, alongside related computing fields like robotics [4]. Notably, there has been a growing focus on developing and reinforcing AI curricula in various organizations and countries, including the Association for Computing Machinery (ACM)/IEEE, UNESCO, the United States, India, China, and Korea [4,5,6,7,8,9,10]. The ACM introduced the Computing Curricula 2020 (CC 2020), which includes data science as a computing field and incorporates AI-related content within three of the eleven data science knowledge areas—artificial intelligence, data mining, and machine learning—and the AI body of knowledge within one of the eleven computer science field areas—intelligent systems [4,5,6].
Regarding the K-12 standard curriculum, UNESCO published a K-12 AI Curriculum founded on competency-based education [3]. Additionally, the Association for the Advancement of Artificial Intelligence (AAAI) and the Computer Science Teacher Association (CSTA) jointly launched the AI for K-12 Students (AI4K12) initiative and introduced K-12 AI Guidelines based on the 5 big ideas for AI education [7]. At the national level, the CBSE of India has designated AI as a skill subject for senior secondary levels XI and XII [8], whereas China has implemented a mandatory elective module for high school students, titled “Preliminary Artificial Intelligence” (“人工智能初步”) [9]. In 2020, Korea established “Introduction to AI” as a career elective subject in high schools, leading to the development of eight types of “Introduction to AI” textbooks [10].
Curricula can be classified into two categories: national-level curriculum systems, in which curricula are developed and evaluated by the national government; and localized curriculum systems, in which curricula are structured by local governments or individual schools [11,12]. In K-12 education, school curricula are realized through textbooks, i.e., textbooks serve as the primary learning materials through which students engage with educational content in both national-level and localized curriculum systems. Consequently, textbooks constitute important indicators that guide teachers in the design of teaching and learning strategies.
Research on textbook analysis has adopted diverse approaches that involve multiple analysis and data consistency perspectives depending on the research objectives and academic characteristics of the subject. Studies have explored textbooks from a “knowledge perspective”, analyzing the discipline or subject matter addressed in the content, as well as from a “social perspective”, examining how educational content reflects social issues or cultural and historical changes. Most studies have focused on analyzing the textual content of textbooks. In cases where textbooks are developed and supplied based on a national-level curriculum system, the studies investigated the changes in content between textbooks published after different curriculum revisions. Previous research has employed various analysis methods, including “semantic analysis through content analysis”, “statistical analysis”, and “mixed methods research”. Applying information technology to textbook data opens new perspectives for fundamental questions in educational research. Consequently, text mining has emerged as an effective tool for analyzing textbooks. AI-driven natural language processing technologies enable frequency analysis of large amounts of text data in textbooks through the Term Frequency-Inverse Document Frequency (TF-IDF) method. Additionally, significance, concordance, and relevance can be objectively calculated through topic modeling using Latent Dirichlet Allocation (LDA).
Since curricula are influenced by social and cultural contexts, the composition and content of AI curricula will differ depending on the institutions or entities responsible for their development. Given the increasing integration and emphasis on AI from higher to K-12 computing education worldwide, it is crucial to investigate the knowledge composition of AI curricula. Therefore, this study aims to evaluate the alignment between AI curricula and textbooks by combining text mining technology and content analysis methods and to suggest directions for enhancing the knowledge composition of AI curricula.

3. Research Methods

This study aimed to systematically analyze the content of AI-related textbooks and propose directions for AI knowledge composition. To achieve this purpose, the research employed a combination of text mining and content analysis to assess the concordance between AI curriculum and textbooks. The research procedure involved steps as shown in Figure 2.
Figure 2. Research process.
First, text mining was conducted using Python, employing TF-IDF analysis and LDA topic modeling. The text was extracted from the main body of the textbooks, and a suitable data preprocessing strategy for text data was implemented. The TF-IDF analysis and LDA topic modeling were then applied to examine the association and peculiarity of AI knowledge composition based on the four areas proposed in the Korean AI curriculum. Second, content analysis was performed by composing a frame and conducting coding to analyze a platform based on the fundamental AI curriculum. In AI education, both theoretical understanding and practical training hold equal importance. The scope and level of vertical content may vary depending on the tool used, even when covering the same content. Finally, the implications for AI curriculum knowledge composition were discussed by conducting a comprehensive analysis of the results obtained through text mining and content analysis.

3.1. Data Collection

In Korea, a basic AI curriculum was set to be established for high schools by 2020 [10]. The Introduction to AI curriculum comprises four areas: “1. Understanding of AI”, “2. Principles and application of AI”, “3. Data and machine learning”, and “4. Social impact of AI”. The curriculum document delineates key concepts, content elements, and learning elements for each area. The contents of the Basic AI Curriculum are presented in Table 5.
Table 5. Contents of basic AI curriculum.
“1. Understanding of AI” encompasses topics such as “social changes and changes in career and occupation due to advances in AI technology” and “AI forming relationships with humans and being utilized as an intelligent agent”. “2. Principles and application of AI” covers aspects like “types and characteristics of various AI approaches for implementing recognition, search, inference, and learning” and “acquiring principles of and differences between approaches based on actual cases”. “3. Data and machine learning” includes topics on “data attribute perspective, machine learning, and classification models” and “problem-solving and performance evaluation methods”. Lastly, “4. Social impact of AI” focuses on “social values of AI and impact recognition” and “social responsibilities and fairness practice as a member of AI society”.
Based on the basic AI curriculum, eight AI textbooks were developed in 2021. This study analyzed the text data of the main text in four units from the eight textbooks.
Table 6 presents the frequency (ratio) of each content area in the eight textbooks [45,46,47,48,49,50,51,52]. This refers to the frequency (ratio) calculated by dividing the ”number of pages allocated to each area” by the ”total number of pages” for individual textbooks. This is significant as through this, the understanding of the importance of each section in individual textbooks can be achieved.
Table 6. Frequency (%) of each content area in the composition of the different textbooks analyzed.
Unit 2 of all textbooks exhibited the highest ratios, ranging from 35.4% to 40.5%. Unit 3 had the second-highest ratio, ranging from 25.1% to 35.0%. The ratio of Unit 1 ranged from 13.5% to 17.5%, while that of Unit 4 ranged from 13.0% to 21.3%. In other words, Units 2 and 3, which displayed a high frequency of content and learning elements in the curriculum documents, held significant weight across all textbooks.

3.2. Data Preprocessing

Data preprocessing is a crucial step in text mining for the analysis of unstructured data. In this study, the focus of data preprocessing was to accurately analyze technical terms in documents related to Korean language education in the computing field [53,54]. The data preprocessing procedure was as follows:
Firstly, since the target of this study was curricula and textbooks, the KoNLPy package, which is a Korean language morpheme parser library reflecting language characteristics, was installed. The text data was then preprocessed using KOMORAN, a morpheme analyzer (parser).
Secondly, the technical terms provided by KOMORAN were employed, considering the analysis focus on documents in the educational field of AI. Technical terms composed of two or more words in KOMORAN, such as “computer vision”, “supervised learning”, and “voice recognition”, were processed as a single corpus.
Thirdly, abbreviations that expressed the same terms, such as using “AI” for “artificial intelligence”, were consolidated into a single corpus.
Fourthly, the analyzed data were transformed into a corpus and tokenized using morphemes. By selecting specific parts of speech and words with one or more letters, the normalization process was enhanced through iterations of stemming, stop-word elimination (numbers, special characters, punctuation marks), and review.
Table 7 presents the frequency of each corpus according to the textbook and morpheme type.
Table 7. Corpus frequency in the composition of the eight textbooks.
The corpus frequency varied across textbooks, ranging from 4928 to 14,042. Proper nouns (NNP), common nouns (NNG), and verbs (VV) exhibited higher corpus frequencies.

3.3. Analysis Method

In this study, a comprehensive analysis of knowledge frequency, topic composition, and practical training tools in AI curricula was conducted through text mining using Python 3.10.3 and a frame-based content analysis that reflects the specificity of computing field textbooks on a large scale.
Firstly, the study examined whether the keywords in each textbook were aligned with the learning elements of the curricula. This was accomplished by performing a TF analysis for each area in the textbooks. Specifically, the top 20 keywords with the highest TF analysis values for the eight textbooks were comparatively analyzed based on an analysis frame composed of the learning elements of each area in the curriculum. This exploration aimed to assess the extent to which the learning elements of each area are covered in the textbooks. In “TF analysis,” to understand the linkage between the areas proposed in the curriculum and the chapters of the textbook, both keywords included in learning elements (√) and other areas (*) were checked. For each keyword, if it was presented as the “learning elements” of the corresponding area in the curriculum, keywords included in learning elements (√) were marked. If it was related to the “learning elements” of other areas, other areas (*) were marked.
Secondly, TF-IDF was employed to deduce the top 15 words with the highest TF-IDF values in each textbook. This analysis aimed to identify the characteristics of the textbooks by distinguishing between terms related to the computing field and keywords related to cases. In other words, while targeting the keywords with high TF-IDF in each area of individual textbooks, the “Computing field-related keywords (*)” were checked, with the aim of understanding what they imply.
Thirdly, LDA topic modeling was conducted to analyze the topic composition of each textbook and compare them with the areas of the AI curriculum. The trained model was visualized using the LDA visualization library pyLDAvis for Python, as shown in Figure 3.
Figure 3. The trained model visualized using the LDA visualization library pyLDAvis.
The distance between the circles in the visualization represents the discriminant validity. A greater distance between topics indicates higher discriminant validity, indicating distinct topics. On the other hand, a closer distance or overlapping topics suggests lower discriminant validity and similarity among topics. The size of the circles indicates the proportion of data accounted for by each topic. The distribution of keywords within each topic was computed and comparatively analyzed against the curriculum’s areas.
Fourthly, a content analysis framework was designed to reflect the specificity of the computing field. In the “Understanding of AI” section of the curriculum, the teaching and learning direction specifies the need to “select and use educational tools and platforms that are appropriate for the level of learners and laboratory environment”. Therefore, a content analysis frame was developed to identify the composition of educational tools per unit in each textbook. After deducing the tools for each unit from the textbook’s main body, the frame was constructed by clustering the tools according to their roles, and coding was performed accordingly.
Ultimately, this study evaluated the alignment between AI curricula and textbooks by integrating text mining technology and content analysis results. Based on the findings of this analysis, implications for AI curriculum composition were suggested.

4. Textbook Analysis Results

4.1. Evaluation of Consistency between Curriculum and Textbooks through Frequency Analysis

4.1.1. Consistency of Curriculum and Textbooks through TF Analysis

A comparison and analysis of the top 20 keywords with high TF analysis results for eight textbooks were conducted using the analysis framework composed of learning elements for each domain of the curriculum. Using this as a foundation, the study explores the extent to which each domain’s learning elements are represented in textbooks.
The concordance between the curriculum and textbooks, as inferred from the TF analysis of each textbook, is presented in Table 8.
Table 8. TF analysis results by curriculum area and textbook.
First, we examine the perspective of “keywords that are learning elements of the curriculum but are not included in textbooks”. In the “2. Principles and Application of AI” section, “sensor” is identified as a learning element that emphasizes the importance of data collection through various types of sensors. However, the term “sensor” had a relatively low frequency in five out of the eight textbooks. Similarly, the frequency of “computer vision” was high in only one of the eight textbooks, while “robot vision” did not appear in any of the textbooks. Terms such as “search”, “inference”, “classification”, “clustering”, and “forecasting” were not very frequent in most textbooks, and none of the textbooks had a high frequency of the term “deep learning”. For instance, textbook A did not list “search”, “inference”, “clustering”, “classification”, “forecasting”, or “deep learning” as top frequency keywords in the “2. Principles and Application of AI” section.
In the “3. Data and Machine Learning” section, no textbook had a high frequency of the term “unstructured data”, whereas keywords related to “core attribute extraction”, “training data”, “test data”, and “performance evaluation” were only found in certain textbooks. In the “4. Social Impact of AI” section, no textbook included top frequency keywords related to “values of AI”. However, the keywords related to “fairness of AI” exhibited a high frequency in four of the textbooks (50%).
Next, we consider the perspective of “keywords from other areas included with a high frequency”. In textbook E, keywords from other sections, such as “data (24)”, “inference (13)”, and “recognition (8)”, were very frequent in the first section of “1. Understanding of AI”. Textbook A included “learning (17)” and “robot (12)”, while textbook B featured “data (15)” and “robot (9)” with high frequency in the first section of “1. Understanding of AI”. In the “2. Principles and Application of AI” section, textbook B had the most frequent occurrences of “learning (58)” and “data (53)” among the learning elements of “3. Data and Machine Learning”. In textbooks A, C, E, F, G, and G, the term “data” appeared with frequencies of 63, 37, 84, 85, 88, and 44, respectively.
Finally, we explored the perspective of “keywords demonstrating a high frequency regardless of curriculum area or textbook type”. Certain terms such as “AI”, “data”, “information”, “human (or “person”)”, and “technology” were top frequency keywords in most textbooks.

4.1.2. Evaluation of Textbook Specificity through TF-IDF Analysis

To understand the characteristics of each textbook, the top 15 words with high TF-IDF values were derived using TF-IDF. The textbooks’ characteristics were examined by distinguishing between “terms related to the computing field” and “keywords associated with the case studies”. The results of examining the specificity of each textbook through the TF-IDF analysis are listed in Table 9.
Table 9. TF-IDF analysis results by curriculum area and textbook.
First, in textbook A, the TF-IDF values for “supervised learning (6.33)” and “pattern recognition (3.92)” were particularly high in the “2. Principles and Application of AI” section, which focused on machine learning. The TF-IDF values in textbook C were high for “node (6.58)” and “proposition (4.7)”. Textbook G exhibited the highest TF-IDF value of 17.65 among the textbooks for “propositional logic”, which was discussed in the context of inference. Furthermore, in textbook G, “node (12.69)”, “proposition (10.34)”, “supervised learning (8.63)”, and “linear regression (8.32)” had significant TF-IDF values. Second, there were cases where the TF-IDF values for terms from other sections were high. In textbook A, “computer vision”, typically addressed in the “2. Principles and Application of AI” section, was included in Unit 1, with a TF-IDF value of 2.94. In textbook C, “1. Understanding AI” and “3. Data and Machine Learning” included cases related to the “4. Social Impact of AI” section, specifically the ethics of AI. Third, we focus on keywords in the computing field that are distinctively used in textbooks. In textbook A, “computing” and “data type” had high TF-IDF values as general terms in the computing field, although they were not among the top frequency keywords in the textbooks. In textbook D, “software”, “function”, and “library” had high TF-IDF values. Textbook G emphasized “security (8.32)” with a high TF-IDF value in the “4. Social Impact of AI” section, indicating the importance of this particular aspect in the context of AI ethics. Finally, the textbooks differed in terms of the cases and data used in the theoretical explanations and practical exercises. For instance, in the “3. Data and Machine Learning” section, textbook A utilized animal or plant data such as “iris”, “salmon”, “bass”, and “fish”, while textbook B used keywords like “meal”, “clothes”, “protein”, “snack bar”, and “go to school” as practical training examples. In textbook H, real-life keywords, such as “movie”, “question”, “answer”, “review”, and “actor”, were utilized.

4.2. Evaluation of Textbook Knowledge Composition through LDA Topic Modeling Analysis

LDA was used to analyze the correspondence between textbook topic composition and curriculum area composition, with the aim of examining the implications of AI knowledge composition. To compare the topic composition of textbooks with the domain structure of the artificial intelligence curriculum, topic modeling was performed using Latent Dirichlet allocation (LDA). We can determine how topics are structured and calculate the weights of keywords that constitute each topic by analyzing the composition of topics in each textbook. The learned model was visualized using pyLDAvis, a Python library for LDA visualization, and the results were reviewed. The results are shown in Table 10.
Table 10. LDA analysis results for each textbook and evaluation of consistency with the curriculum.
The results are outlined below:
First, the topics in each textbook were categorized into two or three distinct groups. None of the textbooks shared identical topics. Two textbooks consisted of two topics, while the remaining six contained three topics. Textbook (D) had two overlapping topics that were similar among the six textbooks with three topics, indicating low discriminant validity (Figure 4). Second, the topic size significantly influenced the data weight. There was a noticeable disparity in topic size between textbooks B and H, which featured two topics, with one topic accounting for a higher proportion of keywords than the other. Third, the evaluation results for curriculum consistency varied for each topic. In some cases, a single curriculum included multiple topics. For example, in textbook A, the topics were categorized into ((2,3), (1,4), (2)) sections of the curriculum, whereas textbook B’s topics were grouped into ((2,3), (1,2,4)), with multiple topics falling under “2. The Principles and Applications of AI”. The topics in textbook H were classified into ((4), (1,2,3)), wherein “4. The Social Impact of AI” consisted of one topic, whereas the other three areas each had one topic.
Figure 4. The trained model visualized using the pyLDAvis.

4.3. Tool Utilization through Content Analysis

Content analysis was used to structure a framework for analyzing platforms based on the foundational AI curriculum, and coding was carried out. This is because both theory and practice are crucial in AI education, and even for identical content, the depth and level of content can vary depending on the tool used. The content analysis method is designed to reflect the uniqueness of the computing field. Consequently, a content analysis framework was developed to identify the composition of the educational tools in each chapter of the textbooks. The tools from textbook chapters were extracted and clustered based on their roles to structure the framework. Coding was performed based on this foundation.
The results of the tool analysis conducted through content analysis are presented as shown in Table 11.
Table 11. Tool analysis result by textbook and area.
First, regarding the use of tools for data processing, including recognition, collection, analysis, and model generation, the following observations were made: In the “2. Principles and Application of AI” section, three textbooks (A, B, and C) utilized Quick, Draw! to facilitate the learning of image data recognition, whereas textbook F used code.org to provide hands-on experience with the supervised learning principles of AI. In “3. The Data and Machine Learning” section, textbooks B, G, and H used Orange3, whereas textbook C used Brightics AI to generate AI models based on various real-life data samples. Notably, some textbooks solely focused on theoretical explanations without incorporating specific tools. Each textbook used a distinct approach to explain the same concept, and only a few textbooks included practical training examples.
Second, regarding AI models and program development, the following perspectives were identified: In the “2. Principles and Application of AI” section, five textbooks used ENTRY; one, Scratch; and six, block-based programming languages. However, these textbooks did not cover program development using programming languages. In the “3. Data and Machine Learning” section, three textbooks used ENTRY and one used Scratch, resulting in four textbooks using block-based programming languages. In addition, three textbooks developed AI programs that processed the data using Python. Notably, in Units 2 and 3, the two textbooks provided options for both text- and block-based programming languages. However, in two or three sections, certain textbooks did not include development practices that use programming languages.
Third, the analysis considered tools that support AI ethics. Seven out of the eight textbooks used moral machines to facilitate ethical decision making in various dilemma situations.

4.4. Results of the Alignment Evaluation between Curriculum and Textbooks

The AI curriculum should be structured to meet societal requirements because industrial society is evolving. There is a growing trend in K-12 education, which educates students ready to enter society, and higher education, which prepares students for their upcoming societal debut, to consider AI education as crucial. In other words, systematic AI curriculum development is meaningful because it positively impacts the cultivation of AI talent, AI research and development, and the direction of technological advancement.
The results of the textbook analysis are discussed as follows:
First, it is essential to review learning elements that are inadequately reflected in each curriculum section. Content knowledge considered important within each curriculum area should be examined to determine whether it is adequately addressed in textbooks. Second, the top-frequency keywords in textbooks that were not covered in the curriculum sections must be reviewed. These keywords serve as fundamental materials for assessing the appropriateness of a curriculum’s knowledge composition. Third, keywords that appear frequently in certain textbooks but are not included as learning elements in the curriculum should be considered for inclusion based on the TF-IDF analysis of the textbooks. Examples of such keywords include “neural networks” and “regression”. Fourth, if subsequent curriculum areas require the incorporation of specific learning elements, the prerequisite knowledge required for effective learning must be considered. Fifth, the division of units should be evaluated to determine if it is clearly structured according to the curriculum knowledge units. A clear division indicates a well-defined knowledge composition of textbook content. This analysis shows that the four curriculum areas are not adequately represented in textbooks. Finally, considering the significant disparity in tool usage across different curricula, it is necessary to propose clear criteria for using tools in teaching and learning. These criteria should be based on each area’s theoretical foundations and aligned with the objectives of practical training.

5. Discussion

Based on the aforementioned results, the directions for knowledge composition in the AI curriculum can be addressed from three perspectives.
First, curriculum composition should facilitate the reconstruction of curriculum knowledge by adopting a concept-based framework that aligns with competency-oriented education. Like in other frameworks, such as the Data Science structure introduced by ACM/IEEE, KAs of Computer Science, and the five big ideas of K-12 AI Guidelines of AI4K12, the curriculum’s detailed areas and overall structure should be designed based on the knowledge or concepts within the field of AI. This approach supports the systematic organization and reconstruction of the curriculum composition.
The content elements within these areas should effectively systematize the competencies in knowledge, skills, values, and attitudes. It is crucial to specify the desired competencies students should possess after completing the curriculum, encompassing comprehensive development in terms of knowledge, skills, values, and attitudes within each area.
To manage and utilize the relationships between different items, a three-layered structure consisting of areas, detailed areas, and content elements should be systematically established and organized. This allows for efficient curriculum revision and enhancement from a curriculum management perspective, as well as effective reconstruction of the curriculum based on the field’s requirements from a curriculum application perspective. The order of the areas should not be constrained by prerequisites or subsequent knowledge. In other words, the AI curriculum’s knowledge composition should be such that the educational content can be reconstituted considering the curriculum composition and the purpose of textbook development.
Similar to frameworks like the CS 2013, CCDS 2021, and K-12 AI Curriculum, it is essential to propose document systems that map K-12 education to the content elements of each curriculum area. As educational and learner environments vary across countries and regions, it is necessary to stratify the content level, as observed in curricula from countries like India or China. This stratification ensures that lessons are designed according to the educational objectives or lesson goals, enabling AI educational content to maintain connectivity and relevance. Content can be extracted based on the specific environment and the learners’ proficiency level in each school, thereby facilitating the operation of curricula.
The second perspective pertains to the composition of knowledge. The content composition within the computing field is closely intertwined with and reflects technological advancements and societal changes. The Computing Curricula of ACM/IEEE, serving as the standard curriculum in the computing field, exhibits mutual influence and complementarity with K-12 curricula worldwide. Therefore, to develop a comprehensive AI curriculum, it is essential to adopt a top-down approach, focusing on the CCDS 2021 and CS 2013 standards provided by ACM/IEEE, while also taking into consideration the K-12 AI Curriculum established by internationally recognized institutions. A thorough examination of standard- and country-specific curricula reveals common elements, such as computing basics (algorithms, computing systems, data, programming, and ethics), AI concepts and principles, AI and its societal implications, the significance of data, modeling, and programming, and the ethical considerations of AI. Additionally, the curriculum should encompass AI convergence education (STEM) by emphasizing the integration of computing + X.
It is crucial to define the scope of knowledge encompassed within the AI curriculum. The AI field, as proposed in the standard computing curriculum of ACM/IEEE and the K-12 curricula of individual countries, covers education in AI application, principles, and convergence. Furthermore, AI education comprises AI knowledge, concept and principles, and developer education. Hence, the curriculum content should be structured based on a clear definition of the curriculum’s scope.
Concepts that require emphasis across all areas should be designated as “cross-cutting” and should permeate throughout the curriculum. This approach is akin to the cross-cutting themes of CSTA 2016 (“abstraction”, “system relation”, “human-computer interaction”, “personal information protection and security”, and “communication and cooperation”) and the big ideas of AP Computer Science (“modularity”, “variable”, “control”, and “impact of computing”) [55,56]. Within CCDS, the “Data Privacy, Security, Integrity, and Analysis for Security” area is defined as “cross-cutting” and relates to all KAs [5]. For example, “data”, “AI security”, “human-computer interaction”, and “modularity” can be incorporated as “cross-cutting” elements within the AI curriculum.
The third perspective focuses on the support for practical training in AI education. Practical training should encompass various aspects, such as “experience”, “application”, “understanding principles”, “implementation (data processing, model design, program development)”, “ethics”, and “convergence”. Instead of prescribing specific tools for practical training in limited areas or content, education should advocate for the availability of a diverse range of tools to teach and learn the same content.
In practical training, two important considerations are the “expansion of data processing possibilities” and “stability”. The data utilized in AI education reflects the society and culture of each country, wherein data bias may become a relative concept. Therefore, instead of focusing on specific types of data, a wide variety of data should be provided, ensuring its safety in terms of privacy, security, and copyright for all students. Starting with the practices using data introduced in textbooks, students should have access to data through secure cloud links, enabling them to design AI models or develop programs that reflect ethical values and are applicable to real-life situations and diverse contexts.
Another aspect to consider is the flexibility of curriculum application. From an educational environment perspective, flexibility in terms of time allocation and balance between theory and practical training should be provided. From a content perspective, flexibility should encompass the differentiation of content, the sequencing of topics, the balance between theory and practice, and the adaptability to different levels of education. Autonomous and active curriculum content should be formulated and implemented by proposing the order of curriculum areas, reconstitution of content, integration of theory and practice based on the school level, and utilization of data application.

6. Conclusions

This study makes a valuable contribution by systematically evaluating the alignment between AI curriculum and textbooks through text mining and content analysis. The text analysis method developed in this study enables a comprehensive examination of textbooks within the computing field’s curriculum. Notably, this study is significant as it demonstrates the potential to quantify and expand textbook analysis by integrating a mining technique that enables the examination of a massive amount of text data from textbooks with a frame-based content analysis that accounts for the specificities of computing textbooks. The findings of this study serve as a solid foundation for conducting further investigations on the relationship between curriculum and textbooks using larger sample data from the computing field. Ultimately, the proposed directions for AI curriculum knowledge composition presented in this study hold particular importance, as they establish the basis for developing K-12 curricula within the computing field.
Artificial Intelligence (AI) education based on a systematic AI curriculum will serve as a foundation for nurturing talents equipped with basic AI literacy, convergence capabilities (AI + X, X + AI), and professional skills in industrial and academic fields. In other words, this study is significant in that it has provided the technology and basic data that can contribute to industry, academia, and education in the AI era.

Author Contributions

Conceptualization, H.Y. and W.L.; methodology, H.Y., J.K. and W.L.; software, H.Y.; validation, W.L., J.K. and H.Y.; writing—original draft preparation, H.Y.; writing—review and editing, H.Y. and J.K.; supervision, J.K. and W.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. NRF-2021R1A2C2013735).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Dondi, M.; Klier, J.; Panier, F.; Schubert, J. Defining the Skills Citizens Will Need in the Future World of Work; McKinsey & Company: Tokyo, Japan, 2021; p. 25. [Google Scholar]
  2. OECD. An OECD Learning Framework 2030; Springer International Publishing: Cham, Switzerland, 2019; pp. 23–35. [Google Scholar]
  3. Miao, F.; Shiohira, K. K-12 AI Curricula. A Mapping of Government-Endorsed AI Curricula; UNESCO: Paris, France, 2022. [Google Scholar]
  4. Clear, A.; Parrish, A.; Impagliazzo, J.; Wang, P.; Ciancarini, P.; Cuadros-Vargas, E. Computing Curricula 2020 (CC2020): Paradigms for Future Computing Curricula; ACM/IEEE Computer Society: New York, NY, USA, 2020. [Google Scholar]
  5. Danyluk, A.; Leidig, P.; Cassel, L.; Servin, C. Computing competencies for undergraduate data science curricula: ACM Data Science Task Force. In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education, Virtual, 13–20 March 2021. [Google Scholar]
  6. Draft, S. Computing Science Curricula 2013(CS2013); ACM/IEEE: New York, NY, USA, 2013. [Google Scholar]
  7. AI4K12. Available online: https://ai4k12.org/ (accessed on 1 May 2023).
  8. CBSE. Artificial Intelligence (Sub. Code 843) Class—XI&XII Cbse Department of Skill Education Curriculum for Session 2021–2022; BSE: New Delhi, India, 2021. [Google Scholar]
  9. Ministry of Education of the People’s Republic of China. Information Technology Curriculum Standards for Ordinary High Schools; Ministry of Education of the People’s Republic of China: Beijing, China, 2017. [Google Scholar]
  10. Ministry of Education. Ministry of Education Announcement No. 2015-74 [Supplementary Book 10]: Curriculum Guidelines for Practical Subjects (Technology/Home Economics) and Informatics Studies; Ministry of Education: Sejong-si, Republic of Korea, 2020. [Google Scholar]
  11. Astiz, M.F.; Wiseman, A.W.; Baker, D.P. Slouching towards decentralization: Consequences of globalization for curricular control in national education systems. Comp. Educ. Rev. 2002, 46, 66–88. [Google Scholar] [CrossRef]
  12. Mok, K.H. (Ed.) Centralization and Decentralization: Educational Reforms and Changing Governance in Chinese Societies; Springer Science & Business Media: Cham, Switzerland, 2013. [Google Scholar]
  13. Gumilar, S.; Hadianto, D.; Amalia, I.F.; Ismail, A. The portrayal of women in Indonesian national physics textbooks: A textual analysis. Int. J. Sci. Educ. 2022, 44, 416–433. [Google Scholar] [CrossRef]
  14. Aivelo, T.; Neffling, E.; Karala, M. Representation for whom? Transformation of sex/gender discussion from stereotypes to silence in Finnish biology textbooks from 20th to 21th century. J. Biol. Educ. 2022, 1–15. [Google Scholar] [CrossRef]
  15. Ho, Y.-R. Indigenous language curriculum revival: An emancipatory education analysis of Taiwanese Indigenous language policy and textbooks. J. Curric. Stud. 2022, 54, 501–519. [Google Scholar] [CrossRef]
  16. Wang, T.; Ma, Y.; Ling, Y.; Wang, J. Integrated STEM in high school science courses: An analysis of 23 science textbooks in China. Res. Sci. Technol. Educ. 2021, 41, 1197–1214. [Google Scholar] [CrossRef]
  17. Zhang, Q.-P.; Wong, N.-Y. The Learning Trajectories of Similarity in Mathematics Curriculum: An Epistemological Analysis of Hong Kong Secondary Mathematics Textbooks in the Past Half Century. Mathematics 2021, 9, 2310. [Google Scholar] [CrossRef]
  18. Pinson, H.; Agbaria, A.K. Ethno-nationalism in citizenship education in Israel: An analysis of the official civics textbook. Br. J. Sociol. Educ. 2021, 42, 733–751. [Google Scholar] [CrossRef]
  19. Chen, K.; Zhou, J.; Lin, J.; Yang, J.; Xiang, J.; Ling, Y. Conducting Content Analysis for Chemistry Safety Education Terms and Topics in Chinese Secondary School Curriculum Standards, Textbooks, and Lesson Plans Shows Increased Safety Awareness. J. Chem. Educ. 2020, 98, 92–104. [Google Scholar] [CrossRef]
  20. Heemann, T.; Hammann, M. Towards teaching for an integrated understanding of trait formation: An analysis of genetics tasks in high school biology textbooks this paper was presented at the ERIDOB conference 2020. J. Biol. Educ. 2020, 54, 191–201. [Google Scholar] [CrossRef]
  21. Lucy, L.; Demszky, D.; Bromley, P.; Jurafsky, D. Content Analysis of Textbooks via Natural Language Processing: Findings on Gender, Race, and Ethnicity in Texas U.S. History Textbooks. AERA Open 2020, 6, 2332858420940312. [Google Scholar] [CrossRef]
  22. Sakhovskiy, A.; Solovyev, V.; Solnyshkina, M. Topic Modeling for Assessment of Text Complexity in Russian Textbooks. In Proceedings of the 2020 Ivannikov Ispras Open Conference (ISPRAS), Moscow, Russia, 10–11 December 2020; IEEE: New York, NY, USA, 2020; pp. 102–108. [Google Scholar]
  23. BouJaoude, S.; Noureddine, R. Analysis of science textbooks as cultural supportive tools: The case of Arab countries. Int. J. Sci. Educ. 2020, 42, 1108–1123. [Google Scholar] [CrossRef]
  24. González-Delgado, M.; Lorenzo, M.F.; Machado-Trujillo, C. The concept of the State in textbooks: Analysis and reinterpretation during the Spanish Transition to Democracy (1976–1986). Br. J. Educ. Stud. 2020, 68, 331–347. [Google Scholar] [CrossRef]
  25. Hyun-joo, P.; Kwon, J. Analysis of inquiry tendencies in high-level middle school 1 chemistry textbooks during the Kim Jong-un era in North Korea. J. Korean Chem. Soc. 2019, 63, 266–279. [Google Scholar]
  26. Rusek, M.; Vojíř, K. Analysis of text difficulty in lower-secondary chemistry textbooks. Chem. Educ. Res. Pract. 2019, 20, 85–94. [Google Scholar] [CrossRef]
  27. Yun, E.; Park, Y. Extraction of scientific semantic networks from science textbooks and comparison with science teachers’ spoken language by text network analysis. Int. J. Sci. Educ. 2018, 40, 2118–2136. [Google Scholar] [CrossRef]
  28. Choi, G.S.; Lee, J.Y.; Yoon, H.S. Development of a quantitative analysis model of creative problem solving ability in computer textbooks. Clust. Comput. 2015, 18, 733–745. [Google Scholar] [CrossRef]
  29. Cohen, R.; Yarden, A. How the Curriculum Guideline “The Cell Is to Be Studied Longitudinally” Is Expressed in Six Israeli Junior-High-School Textbooks. J. Sci. Educ. Technol. 2010, 19, 276–292. [Google Scholar] [CrossRef]
  30. Lei, L. Text Analysis with R for Students of Literature. J. Quant. Linguist. 2016, 23, 228–233. [Google Scholar] [CrossRef]
  31. Dieng, A.B.; Ruiz, F.J.; Blei, D.M. The dynamic embedded topic model. arXiv 2019, arXiv:1907.05545. [Google Scholar]
  32. Ferreira-Mello, R.; André, M.; Pinheiro, A.; Costa, E.; Romero, C. Text mining in education. Wiley Interduce Rev. Data Min. Knowl. Discov. 2019, 9, e1332. [Google Scholar] [CrossRef]
  33. Rezgui, Y. Text-based domain ontology building using Tf-Idf and metric clusters techniques. Knowl. Eng. Rev. 2007, 22, 379–403. [Google Scholar] [CrossRef]
  34. Mcauliffe, J.; Blei, D. Supervised topic models. Adv. Neural Inf. Process. Syst. 2007, 20, 1–8. [Google Scholar]
  35. Hoffman, M.; Bach, F.; Blei, D. Online learning for latent dirichlet allocation. Adv. Neural Inf. Process. Syst. 2010, 23, 1–9. [Google Scholar]
  36. Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent dirichlet allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar]
  37. Chen, S.; Wu, L.; Zhuo, J. The Application of Unsupervised Learning TF-IDF Algorithm in Word Segmentation of Ideological and Political Education. Wirel. Commun. Mob. Comput. 2022, 2022, 5219117. [Google Scholar] [CrossRef]
  38. Fukushima, Y.; Shin, M.; Miyazaki, K.; Ito, T.; Yonekura, R.; Tanaka, M.S. Report Search Function Using TF-IDF for PBL Education. In Proceedings of the 2020 IEEE 9th Global Conference on Consumer Electronics (GCCE), Kobe, Japan, 13–16 October 2020; IEEE: New York, NY, USA, 2020; pp. 802–803. [Google Scholar]
  39. Lee, D.; Kwon, H. Keyword analysis of the mass media’s news articles on maker education in South Korea. Int. J. Technol. Des. Educ. 2020, 32, 333–353. [Google Scholar] [CrossRef]
  40. Sekiya, T.; Matsuda, Y.; Yamaguchi, K. Mapping analysis of CS2013 by supervised LDA and isomap. In Proceedings of the 2014 IEEE International Conference on Teaching, Assessment and Learning for Engineering (TALE), Wellington, New Zealand, 8–10 December 2014; IEEE: New York, NY, USA, 2014; pp. 33–40. [Google Scholar]
  41. Wen, Y.; Zhao, X.; Li, X.; Zang, Y. Explaining the Paradox of World University Rankings in China: Higher Education Sustainability Analysis with Sentiment Analysis and LDA Topic Modeling. Sustainability 2023, 15, 5003. [Google Scholar] [CrossRef]
  42. Cutumisu, M.; Guo, Q. Using Topic Modeling to Extract Pre-Service Teachers’ Understandings of Computational Thinking From Their Coding Reflections. IEEE Trans. Educ. 2019, 62, 325–332. [Google Scholar] [CrossRef]
  43. Altamirano, M.; Uribe, P.; Schlotterbeck, D.; Jiménez, A.; Araya, R.; Moris, J.v.d.M.; Caballero, D. Unsupervised characterization of lessons according to temporal patterns of teacher talk via topic modeling. Neurocomputing 2022, 484, 211–222. [Google Scholar] [CrossRef]
  44. Gurcan, F.; Cagiltay, N.E. Big Data Software Engineering: Analysis of Knowledge Domains and Skill Sets Using LDA-Based Topic Modeling. IEEE Access 2019, 7, 82541–82552. [Google Scholar] [CrossRef]
  45. Kumsung. Introduction to Artificial Intelligence; Kumsung: Seoul, Republic of Korea, 2021. [Google Scholar]
  46. Gilbut. Introduction to Artificial Intelligence; Gilbut: Gyeonggi, Republic of Korea, 2021. [Google Scholar]
  47. MiraeN. Introduction to Artificial Intelligence; MiraeN: Jeonnam, Republic of Korea, 2021. [Google Scholar]
  48. Visang. Introduction to Artificial Intelligence; Visang: Seoul, Republic of Korea, 2021. [Google Scholar]
  49. Samyang. Introduction to Artificial Intelligence; Samyang: Seoul, Republic of Korea, 2021. [Google Scholar]
  50. Seongandang. Introduction to Artificial Intelligence; Seongandang: Seoul, Republic of Korea, 2021. [Google Scholar]
  51. Cmass. Introduction to Artificial Intelligence; Cmass: Seoul, Republic of Korea, 2021. [Google Scholar]
  52. Chunjaetext. Introduction to Artificial Intelligence; Chunjaetext: Seoul, Republic of Korea, 2021. [Google Scholar]
  53. Park, E.L.; Cho, S. KoNLPy: Korean natural language processing in Python. In Proceedings of the 26th Annual Conference on Human & Cognitive Language Technology, Chuncheon, Republic of Korea, 22 April–13 May 2014; Volume 6, pp. 133–136. [Google Scholar]
  54. Hidayatullah, A.F.; Ma’arif, M.R. Road traffic topic modeling on Twitter using latent dirichlet allocation. In Proceedings of the 2017 International Conference on Sustainable Information Engineering and technology (SIET), Batu City, Indonesia, 24–25 November 2017; IEEE: New York, NY, USA, 2017; pp. 47–52. [Google Scholar]
  55. K-12 Computer Science Framework Steering Committee. K-12 Computer Science Framework; ACM: New York, NY, USA, 2016. [Google Scholar]
  56. College Board. College Board AP® Computer Science a Course and Exam Description; College Board: New York, NY, USA, 2020. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.