Development of an Artificial Intelligence-Based Text Sentiment Analysis System for Evaluating Learning Engagement Levels in STEAM Education

Wu, Chih-Hung; Peng, Kang-Lin

doi:10.3390/app15084304

Open AccessArticle

Development of an Artificial Intelligence-Based Text Sentiment Analysis System for Evaluating Learning Engagement Levels in STEAM Education

by

Chih-Hung Wu

¹

and

Kang-Lin Peng

^2,*

¹

Department of Digital Content and Technology, National Taichung University of Education, Taichung 400, Taiwan

²

Faculty of International Tourism and Management, City University of Macau, Macau, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(8), 4304; https://doi.org/10.3390/app15084304

Submission received: 25 January 2025 / Revised: 18 March 2025 / Accepted: 31 March 2025 / Published: 14 April 2025

(This article belongs to the Special Issue Application of Information Systems)

Download

Browse Figures

Versions Notes

Abstract

This study aims to create an AI system that analyzes text to evaluate student engagement in STEAM education. It explores how sentiment analysis can measure emotional, cognitive, and behavioral involvement in learning. We developed an AI-based text sentiment analysis system to assess learning engagement, integrating speech recognition, natural language processing techniques, keyword analysis, and text sentiment analysis. The system was designed to evaluate the level of learning engagement effectively. A computational thinking curriculum and study sheets were developed for university students, and students’ participation experiences were collected using these study sheets. The study utilized the strengths of SnowNLP and Jieba, proposing a hybrid model to perform sentiment analysis on students’ learning experiences. We analyzed: 1, The effect of sentiment dictionaries on the model’s accuracy; 2, The accuracy of different models; and 3, Keywords. The results indicated that different sentiment dictionaries had a significant impact on the model’s accuracy. The hybrid model proposed in this study, utilizing the NTUSU sentiment dictionary, outperformed the other four models in effectively analyzing learners’ emotions. Keyword analysis indicated that teaching materials or courses designed to promote practical, fun, and easy ways of thinking and building logic helped students develop positive emotions and enhanced their learning engagement. The most frequently occurring keywords associated with negative emotions were “problem”, “error”, “not”, and “mistake”. This finding suggests that learners experiencing challenges during the learning process—such as encountering mistakes, errors, or unexpected outcomes—are likely to develop negative emotions, which in turn decrease their engagement in learning.

Keywords:

artificial intelligence; text sentiment analysis; engagement; learning performance assessment

1. Introduction

The integration of Artificial Intelligence (AI) into education has introduced new approaches for evaluating student engagement, particularly in Science, Technology, Engineering, Arts, and Mathematics (STEAM) programs. This paper explores the development of an AI-based text sentiment analysis system designed to evaluate learning engagement levels by analyzing textual data from students, such as responses, feedback, and discussions. By utilizing natural language processing techniques, this system aims to provide educators with actionable insights to enhance teaching strategies and foster a more interactive and effective STEAM learning environment.

Bloom’s taxonomy, introduced in 1956, originally categorized learning objectives into six ascending levels. A recent study [1] updated the taxonomy to encompass these levels: remembering, understanding, applying, analyzing, evaluating, and creating, as illustrated in Figure 1. According to Bloom’s framework, the highest level of learning is the ability to produce original work. This study adopted this framework to design its research objectives [1].

STEM integrates the disciplines of science, technology, engineering, art, and mathematics, and STEAM education assists in developing problem-solving skills by fostering cross-disciplinary integration [2,3]; hands-on STEM activities designed through 6E orientation can help enhance technological literacy [3]. Computational thinking (CT) is designed to train problem-solving methods, help students understand the nature of the problem, and develop possible solutions to complex problems. These solutions are then presented in a manner that can be understood by computers, humans, or both [4]. Therefore, the main focus of CT is to learn to program problems, solve them systematically, and practice creativity [5,6]. To reach the sixth level of Bloom’s learning goals, the students must be able to create their own products [1]. Further, inquiry-based learning involves asking questions, making observations, conducting research to discover information that has been recorded, developing experimental methods and tools for data collection, collecting, analyzing, and interpreting data, outlining possible explanations, and creating predictions for future research. This helps learners explore learning topics, achieve deeper learning, and create their own work [7].

Text sentiment analysis, also known as opinion mining or emotion AI, involves studying people’s opinions, sentiments, evaluations, appraisals, attitudes, and emotional states, such as happiness, fear, and surprise, etc. [8]. Sentiment analysis is a valuable tool for gaining insights from social networks [9] and has been widely applied in various domains, including assessing learning engagement in STEAM education, analyzing social media trends in financial markets [10], evaluating customer satisfaction [11], and measuring employee engagement during digital transformation [12].

Learning engagement is crucial for enhancing learning performance, as highly engaged students are more likely to actively participate in class, understand and retain information, and achieve superior learning outcomes. An AI-based sentiment analysis system can evaluate learners’ engagement by analyzing the sentiments expressed in their written feedback, forum posts, or other textual communications. Therefore, the purpose of this study is to develop an AI-based sentiment analysis system to evaluate learners’ engagement levels in education.

2. Literature Review

2.1. Learning Engagement

In many constructivist approaches to teaching and learning, group work and collaboration are encouraged so that students can actively learn and share ideas in groups or among peers to deepen their understanding of the subject matter [13]. This approach to learning through collaboration and assistance can therefore be applied to different types of learning, such as problem-based learning [5,14], project-based learning, and inquiry-based learning [7]; collaborative learning is also widely used in STEM education research as the main method of curriculum delivery [13,14,15,16,17]. A technology acceptance model for high school STEM action learning was proposed, and it was found that social influence directly affects the intention to use STEM, meaning that community interaction has a significant impact on STEM teaching and learning activities. Therefore, this study adopted a cooperative learning approach for the design of lesson plans.

Among the indicators for assessing the effectiveness of cooperative learning, learning engagement is important [18,19,20]. According to the definition of engagement [21], the measurement of engagement can be divided into three dimensions: behavioral, affective, and cognitive. Behavioral engagement represents the extent to which students actively participate in learning activities. These indicators include the amount of time spent on learning activities or the extent to which they interact with other people and are usually measured by observable behaviors. Cognitive engagement represents the extent to which students expend brainpower to understand the content of the curriculum and is assessed by motivation, self-regulation, or deep learning [22]. Luo et al. (2020) [22] determined behavioral, affective, and cognitive engagement measures by observing interactive behaviors of students during STEM activities. Beyond observation, we employed a learning engagement questionnaire that assessed three dimensions—affective, behavioral, and cognitive engagement—through eight statements rated on a Likert scale [23]. The questionnaire demonstrated adequate reliability, with Cronbach’s alpha values ranging from 0.71 to 0.80. A previous study developed an Online Student Engagement Scale (OSES) to determine the actual level of student engagement in learning [24]. The OSES has been rigorously tested for its reliability and validity. The scale consists of 16 questions divided into four dimensions: skills, affect, engagement, and performance. The “skills” questions focus on how students interact with the content, the “emotions” questions focus on how to make the content interesting and useful, and the “engagement” questions focus on how students interact with others. The “performance” questions focused on test scores. Questionnaires are usually administered at the end of the learning process, so their accuracy is easily influenced by students’ personal factors and lack of objectivity. Therefore, methods to assess real-time participation by recording behaviors during the learning process have also been developed. The earliest measures of real-time engagement were based on log files generated from online learning systems, which provided detailed information about the timing, frequency, antecedents, and outcomes of learning activities and allowed researchers to examine which types of learning activities caused changes in student engagement.

2.2. Engagement Assessment

The second-generation real-time engagement assessment was recorded using physiological or neurological sensors. The sensor data analysis method focused on analyzing the data collected by various sensors during the learning process to assess engagement. Commonly used sensors included electroencephalograms, electrocardiograms [25], pressure sensors [26], eye-tracking [27], and skin potentials [28]. The physiological data collected by these sensors can be converted and validated to represent the learners’ real-time learning data. However, the wearing of certain devices may cause discomfort and affect the learning outcomes of the students; thus, third-generation automatic recognition technologies that use advanced computational techniques (e.g., deep learning models) are used to estimate an individual’s learning engagement by analyzing facial expressions or body postures [29,30,31]. However, the computer vision-based approach has limitations. First, because the process of training the classifier relies on manual judgment, there is no guarantee that errors caused by manual judgment can be excluded; second, a single feature cannot represent the total learning engagement. For example, facial expressions can be used to measure emotional engagement [29] and cognitive engagement [30], whereas skin potential shows a moderate correlation between cognitive engagement and behavioral engagement [28].

In previous collaborative learning studies, participants’ conversations and actions were recorded through video, and their behaviors were coded to analyze their learning performance. Because this study adopts a cooperative learning approach, it is necessary to deal with situations where multiple subjects interact at the same time. Therefore, this study proposes the development of a speech recognition system that records student interactions during collaborative learning through audio recording, which is automatically converted into text by the speech recognition system and encoded with fuzzy comparisons based on a set of keywords. The advantage of using speech recognition is that the learning process can be recorded without affecting the subject’s activity, and the collation of research data through speech recognition and AI can significantly improve the accuracy and reduce possible errors of manual recognition. Current studies adopted various deep learning networks, such as a recurrent neural network (RNN) coupled with an attention mechanism, to develop a prompted-enhanced sentiment analysis [32] or aspect-based sentiment system [33].

However, to compensate for the fact that speech recognition may not represent the full range of engagement, previous studies coded engagement by analyzing participant interactions in video recordings [34]. The coding of engagement in the video was guided by behaviors outlined in the study’s definition [35] and those associated with instructional behavioral engagement [36], which identifies whether the participant is actively engaged by observing behavioral actions. In summary, measuring engagement through speech recognition provides an objective measure that does not interfere with student learning and is an efficient and effective measurement tool [25]. A review of past research suggests that it is appropriate to measure learning engagement with physiological signals, such as gestures and speech sounds, and that there is no research that uses speech recognition of text followed by text analysis techniques to provide a measure of learning engagement.

2.3. Our Contributions

This research presents the design and implementation of a novel system that integrates artificial intelligence (AI) voice recognition, natural language processing (NLP), and sentiment analysis to evaluate learners’ emotional engagement based on textual data derived from learning sheets and real-time discussions. The study employed a structured methodology for collecting and processing educational text data, utilizing learning sheets completed by 129 first- and second-year university students during a two-week computational thinking (CT) course. The researchers introduced a participation recognition system that leverages AI speech-to-text conversion and emotion scoring to assess real-time engagement during student discussions. We provide empirical evidence of the system’s effectiveness through an analysis of students’ learning texts, which yields a dictionary of feelings and engagement indicators. Although sentiment analysis has been widely applied in other domains, this work uniquely adapts it to the STEAM education context, focusing on CT and inquiry-based learning. By linking learners’ textual responses and emotional scores to their perceptions of course utility, the researchers bridged the gap between emotional engagement and educational outcomes, thus contributing to the growing body of research on AI-enhanced pedagogy.

3. Research Methodology

In this study, we propose an AI-based text sentiment analysis system to evaluate learners’ engagement in STEAM education within a Computational Thinking (CT) and inquiry-based learning curriculum. The system comprises four integrated models: (1) AI Voice Recognition Model: Using the Google Voice Recognition API, this model converts student discussion audio into text for engagement analysis. (2) Word-Breaking Model: Using Jieba Chinese Natural Language Processing, this model segments text into keywords and removes stop words to prepare data for sentiment analysis. (3) Keyword Analysis Model: This model filters and analyzes keywords, labels them as high- or low-participation, and generates matrices and word clouds to identify engagement patterns. (4) Emotion Analysis Model: This model uses various sentiment dictionaries to calculate positive and negative emotion scores per sentence (Equations (1)–(3)) and aggregates these into participation scores. The purpose of this study is to provide educators with a tool to assess and enhance emotional engagement in STEAM by correlating sentiment with perceived course helpfulness, as validated through university student data.

3.1. Data Collection Method

In this study, the computational thinking (CT) concept combined with an inquiry-based learning curriculum and learning sheets was designed to collect learning experiences; the learning sheets were used as input for the subsequent development of a text–emotion analysis model. The input variable is the user’s response to the learning sheet (Learning Sheet Q2), and the output variable is whether the learner thinks the CT course learning activity is helpful (Learning Sheet Q3), with “Yes” indicating that it is helpful and “No” indicating that it is not. The course was conducted for first- and second-year university students over a two-week period. The course content was CT and inquiry-based learning (Python fundamentals), and the study sheets were completed after the course. In total, 129 learning texts were collected. The study sheets were designed as follows.

3.2. Computational Thinking + Inquiry Learning #1 (Python Fundamentals) Learning Sheet Training

In the first course, students were given tasks to complete the following learning sheets using the Foundations of Computational Thinking (Python) learning sheet. This was used to train the basic concepts of CT. Students were asked to answer these questions. Students were also given discussion topics and inquiry-based learning methods to discuss their answers with each other. The design of the learning sheet is shown in Table 1.

Prg #D refers to Program 4 in our design learning activity. In the coding practice class, the teacher first teaches and demonstrates several programs. Then, the students are required to complete the result of Program Example 4 and fill out the learning sheet. The goal of this program is to allow the user to input a number n and then display the left half of a triangle. For example, if the user enters 3, the program will display three rows of the left half of a triangle.

The learning sheet consists of three main sections: 1. Program Objectives and Tasks: The goals and tasks that the program needs to accomplish. 2. Computational Thinking (CT) Steps: The four steps of computational thinking (breaking down the problem, pattern recognition, abstraction, and algorithm). 3. Learning Reflections: Open-ended questions, including: Q1: Explain your CT steps. Q2: Describe your learning experience. Q3: Evaluate your own learning outcomes.

In the second section focusing on computational thinking, the first column lists the four CT steps, the second column is for learners to record their analysis based on these steps, and the third column contains a reference demo provided by the teacher.

3.3. Participation Recognition System for AI Voice Recognition/Text Emotion Recognition System for Creative Education

We created a participation recognition system that integrates AI speech and emotion recognition for creative education. The system was used to calculate and analyze the engagement level of real-time learning in STEAM education. The entire system architecture is shown in Table 2 and Figure 2. It includes the following modules.

(1): AI voice recognition system: By linking to the Google Voice Recognition API, it converted the student discussion process voice into text.
(2): Word-breaking system: Through natural language processing (NLP) technology, Jieba Chinese word-breaking technology was used to break the entire sentence. Thus, it could be turned into keywords that were analyzed and stored in the back-end web database (MySQL).
(3): Keyword analysis system: After obtaining the word breaks, the keywords were manually labeled as high- and low-participation groups through the keyword filtering system. Then, using keyword analysis technology, we analyzed important keywords. This study adopted keyword analysis technology that used word segmentation techniques to break text into meaningful words. After removing stop words, the frequency of each word was calculated to analyze important keywords, and finally, a word cloud was created based on these frequencies.
(4): Emotion analysis system: After the keywords were extracted, the students’ emotions of each sentence were analyzed through the emotion analysis system, combined with the emotion dictionary, and the positive and negative emotion scores of each sentence were calculated; the emotion scores of each student’s sentences were summed up to obtain the emotion participation score of each student.

Table 2. System environment.

Win10 + Apache Web Server
Python, Anaconda, TensorFlow, Keras, Jieba, NLP module.
Front-end HTML, JavaScript, PHP programs
Back-end MySQL database, Flask Web Services

Figure 2. AI speech recognition/emotion recognition system architecture.

In the AI voice recognition system, the subject entered their student number information at the beginning of the experiment. At the beginning of the experiment, the participants entered their student numbers into the AI voice recognition system. The recording then began, and the system converted the subject’s speech into text. After input, the system converted the subject’s speech content into text. The system immediately stored the recognized text in the back-end web database. The system also analyzed the user’s emotions during the conversation. This was used as the analysis information for the participation level.

3.4. Keyword Extraction

In this study, we performed keyword selection after word-breaking by referring to the procedure in [37]. The system used the NLP suite, and Jieba was used for word extraction analysis to compare its effectiveness. (1) First, we used Jieba to break the words. Some unnecessary stop words (such as symbols [ ], 、, <, >) were removed before clearing the data. (2) A list of results after word breaks was created by blanking. (3) Sentences were manually marked as high- or low-participation and provided to the system. (4) Keywords were filtered using various methods in Python. The keyword count was converted into a matrix. (5) Keyword analysis was performed using word clouds.

3.5. Emotional Engagement Level Calculation

After systematic word determination using the NLP package, the keywords in each sentence were compared with the sentiment dictionary using the Chinese Sentiment dictionary [38] and NTUSD (National Taiwan University Sentiment Dictionary) [39] to determine the number of positive and negative words in each sentence. The NTUSD is a dictionary of Chinese words published in 2006 and is available free of charge for academic use. The dictionary includes 11,088 semantic words in both simplified and traditional Chinese, including 2812 positive and 8276 negative words. The dictionary is labeled as positive or negative only, with no distinction between intensity or other additional information [40].

Another emotion dictionary integrates the China HowNet emotional dictionary with other words [39,41] through web crawling containing a total of 7376 positive words and 12,646 negative words. The original text is in simplified Chinese. After converting it into traditional Chinese, it was used as another emotion dictionary.

In this study, the sentiment calculation formula [40] was modified. As in Equations (1)–(3),

N_{c_{i}}

denotes the negative score of the C_i sentence;

{f n}_{c_{i}}

is the frequency of negative words of the C_i sentence.

P_{c_{i}}

denotes the negative score of the C_i sentence;

{f p}_{c_{i}}

is the frequency of positive words of the C_i sentence. Equation (1) calculates the negative score of a sentence in the learner discussion after converting speech to text. Equation (2) calculates the positive score of a learner’s sentence. Equation (3) calculates the sentiment score

S_{c_{i}}

to present positive or negative intention for the sentence. The sentiment score was then used as indicators of learners’ immediate emotional engagement levels.

N_{c_{i}} = \frac{{f n}_{c_{i}} / \sum_{j = 1}^{m} {f n}_{c_{j}}}{{f p}_{c_{i}} / \sum_{j = 1}^{n} {f p}_{c_{j}} + {f n}_{c_{i}} / \sum_{j = 1}^{m} {f n}_{c_{j}}}

(1)

P_{c_{i}} = \frac{{f p}_{c_{i}} / \sum_{j = 1}^{n} {f p}_{c_{j}}}{{f p}_{c_{i}} / \sum_{j = 1}^{n} {f p}_{c_{j}} + {f n}_{c_{i}} / \sum_{j = 1}^{m} {f n}_{c_{j}}}

(2)

s_{c_{i}} = (P_{c_{i}} - N_{c_{i}})

(3)

The emotion corpus is based on the NTUSD (National Taiwan University Sentiment Dictionary), published in 2006, which provides 2812 positive and 8276 negative Chinese words and is widely utilized for academic sentiment analysis in Chinese texts. However, its static nature and potential lack of intensity gradation limited its applicability to our dynamic, education-focused dataset. To address this limitation, we supplemented it with another sentiment dictionary (containing 7376 positive and 12,646 negative words), thereby enhancing coverage and ensuring greater relevance to contemporary language use among students.

The sentiment calculation formula was modified to align with the objectives of evaluating learners’ emotional engagement in STEAM education using text from learning sheets and real-time discussions. The original formula determined emotions using binary classification, labeling them solely as positive or negative. To address this, we made several adjustments: we incorporated the frequency and relative proportion of positive and negative words within each sentence into the calculation, and by setting a threshold value, we used the quantified emotion scores to classify the polarity of the emotions. This approach enables us to assess and infer each student’s emotional engagement.

3.6. Research and Analysis Process

The research and analysis process is shown in Figure 3. Model A is the original SnowNLP method, Model B is Jieba, and Model C is NLTK. This study proposes two models to improve the original SnowNLP method: Model D is SnowNLP retrained with a self-constructed emotion dictionary, and Model E is the SnowNLP + Jieba method. Finally, the accuracy of the different models was analyzed to determine the best model. The Python implementations of Jieba (https://github.com/fxsjy/jieba, accessed on 1 January 2025), SnowNLP (https://github.com/isnowfy/snownlp, accessed on 1 January 2025), and NLTK (https://github.com/nltk/nltk, accessed on 1 January 2025) are available on GitHub.

4. Results and Analysis

In this study, a total of 129 learning texts were collected from first- and second-year university students over a two-week period, as noted in the data collection methods subsection. The experiments were conducted on a desktop computer equipped with an Intel Core i7-9700 processor (8 cores, 3.0 GHz base frequency), 16 GB of DDR4 RAM, and an NVIDIA GeForce GTX 1660 Ti GPU with 6 GB of VRAM. This setup provided adequate computational power for natural language processing (NLP) tasks, including word-breaking using various toolkits (Jieba, SnowNLP, and NLTK), keyword extraction, and emotion analysis. The data processing pipeline included several stages, and we report approximate times based on the system described above. (a) Speech-to-text conversion: Using the Google Voice Recognition API, converting student discussion audio (averaging 5 min per session) to text took approximately 10–15 s per recording, depending on audio quality and network latency. (b) Word-breaking with Jieba: Processing the 129 learning texts (approximately 645–903 sentences) using the Jieba Chinese word-breaking technology took about 2–3 min in total, including stop-word removal. (c) Keyword extraction and analysis: Generating the keyword matrix and word clouds for the entire dataset required approximately 5–7 min, with manual labeling of high/low-participation groups adding an additional 1–2 h of human effort. (d) Emotion analysis: Calculating emotion scores using the combined Chinese Sentiment Dictionary (7376 positive, 12,646 negative words) and NTUSD (2812 positive, 8276 negative words) took roughly 3–4 min for the full dataset, as the system compared keywords against the dictionaries and computed scores per sentence using Equations (1)–(3).

The back-end web database used MySQL 8.0 and was hosted locally on the same machine, with a storage capacity of approximately 50 MB for raw text, segmented keywords, and emotion scores. The AI voice recognition system processed real-time audio input via a standard USB microphone (sampling rate: 44.1 kHz). This study analyzed the high-frequency keywords associated with positive and negative emotions in learning experiences. Keywords that appeared more than 10 times were selected and sorted to create the word cloud.

Model training for the emotion analysis system utilized popular NLP toolkits rather than a machine learning classifier due to the reliance on predefined sentiment dictionaries, which reduced computational overhead while maintaining accuracy for this exploratory study.

4.1. Effect of Sentiment Dictionaries on Model Accuracy

In this study, two different sentiment dictionaries were used to analyze model correctness: Method 1 used the NTUSU sentiment dictionary, and Method 2 used an online sentiment dictionary. The results show that the accuracy rate of Method 1 was higher than that of Method 2. A comparison of the accuracy rates is shown in Table 3. Because the original model does not use a sentiment dictionary, the accuracy rate is not different when using different sentiment dictionaries. In Models D and E, the accuracy rate was improved by 23.26% and 7.75%, respectively, using NTUSU’s sentiment dictionary. This indicates that a more suitable emotion dictionary will help improve the emotion discrimination accuracy. Therefore, the development of an effective sentiment analysis model can be facilitated by creating a dictionary suitable for students’ learning emotions.

4.2. Analysis of the Accuracy of the Different Models

After the first part of the analysis, NTUSU was used as the sentiment dictionary to analyze the accuracy rates of the five text-based sentiment analysis models. A comparison of the accuracy rates of the different models is presented in Table 4. The results of this study showed that the accuracy rates of Models D and E were higher than those of the original models. The best model is Model E, a hybrid model combining the SnowNLP and Jieba assertions, with a 95.35% accuracy rate. This shows that the hybrid model proposed in this study is effective for analyzing learners’ emotions. Further analysis of the model improvement showed that the hybrid model (Model E) improved 55.04% over the original SnowNLP model, 7.75% over the Jieba model (Model B), and 6.2% over the retrained SnowNLP model (Model D) using the NTUSU emotion dictionary.

The accuracy rates of the different models are plotted as shown in Figure 4. Both models proposed in this study have better accuracy than the original model, and the hybrid model is the best model, with 95.35% accuracy.

4.3. Keyword Analysis and Word Cloud Analysis

This study used the best model (Model E) to analyze the keywords after breaking the words of learning experiences, and finally created a word cloud analysis. In this study, we analyzed the top keywords for positive and negative emotions in learning experiences. Keywords with more than 10 occurrences were selected and sorted in the following order. The most frequently used keywords for positive emotions were 1, thinking (18); 2, logic (17); 3, easy (16), 4, practical (13); and 5, interesting (11). This shows that if teaching materials or courses are designed to help students think, build logic, and do it in a practical, fun, and easy way, it will help students build positive emotions and increase their engagement in learning. The most frequently used keywords for negative emotions were 1, problem (30); 2, mistake (18); 3, not (10); and 4, error (10). This shows that if learners have problems during the learning process, such as making mistakes or errors in execution results, or if the results are not as expected or wanted, they will have negative emotions and therefore will reduce their participation in learning. The summary of keyword analysis is summarized in Table 5.

Through word cloud analysis method, the positive and negative keywords of the learners’ learning experiences are made into a word cloud, as shown in Figure 5 and Figure 6.

4.4. Discussions

The system’s performance in different educational settings (e.g., online vs. in-person classes) may yield different results. Both settings benefit from the system’s ability to analyze verbal and textual engagement. Online classes can leverage its compatibility with virtual platforms, while in-person settings excel with direct audio capture and immediate interaction. However, online performance might suffer from network instability or noisier audio inputs, whereas in-person settings avoid these issues but require physical presence. The in-person environment can be influenced by external noises such as multiple students talking at once, background music, and simultaneous conversations between teachers and students. Therefore, using post-production manual labeling might be necessary to achieve better results. The rule-based emotion scoring and small dataset limit adaptability in both settings, although manual labeling is more manageable in person with teacher oversight.

The sentiment analysis model’s biases originate from its static dictionary, manual processes, limited sample size, and insufficient contextual depth. Mitigation strategies encompass refining the dictionary, enhancing natural language processing sophistication, standardizing labeling procedures, and expanding data diversity. While the current system is suitable for the manuscript’s STEAM context, these proposed improvements would enhance its reliability across various educational settings, balancing feasibility with the study’s resource constraints, such as mid-range hardware capabilities.

Adapting the system for non-STEAM subjects involves reconfiguring input prompts, updating the sentiment dictionary, refining keywords, adjusting scoring, and aligning with classroom dynamics. These changes leverage its existing architecture—voice/text analysis and emotion scoring—making it versatile for assessing engagement in subjects such as literature, history, or languages, provided that subject-specific tweaks are applied. Further pilot studies in these areas would validate its effectiveness beyond STEAM.

5. Conclusions

This study developed a textual sentiment analysis model utilizing five models: Model A is the original SnowNLP method, Model B is Jieba, and Model C is NLTK. This study proposed two models to improve the original SnowNLP method. Model E is the SnowNLP + Jieba method for comparison and analysis. The study analyzed (1) the effect of sentiment dictionaries on the accuracy of the models, (2) the accuracy of the different models, and (3) the extracted keywords.

First, the effect of sentiment dictionaries on model accuracy showed that different sentiment dictionaries significantly affect model accuracy, and NTUSU is a relatively good sentiment dictionary for learning sentiment discrimination and can improve model accuracy. Second, the accuracy of the different models was analyzed. The results of this study show that the best model was Model E, a hybrid model combining the SnowNLP and Jieba assertions, with an accuracy rate of 95.35%. The best model was used to determine the keywords of positive and negative emotions. This shows that if teaching materials or courses are designed in a way that helps students think, build logic, and do it in a practical and fun way, it will help students build positive emotions and increase their participation in learning. Third, the most frequent keywords for negative emotions were problem, error, not, and mistake. This shows that learners who have problems during the learning process, such as making mistakes or errors and obtaining results that are not what they expected, will have negative emotions and, therefore, lower learning engagement.

5.1. Practical Implications

The advantages of formative assessment include the system’s capacity to provide real-time feedback to optimize teaching, personalize learning paths, and enhance student self-regulation. However, its challenges include limited data accuracy in noisy or multi-learner environments, potential privacy violations through continuous emotion monitoring, and the risk of weakening the emotional bond between teachers and students during direct interactions owing to overreliance on AI feedback. The application of an AI system evaluation in summative assessments can efficiently process large-scale data and enhance evaluation objectivity. However, challenges arise from the weak correlation between emotion and learning performance, difficulties in accurately assessing knowledge mastery, and complexities in capturing the trajectory of emotional changes in long texts for academic evaluations. Furthermore, future research could examine the correlation between sentiment data and learning performance. In conclusion, the AI-based text sentiment analysis system demonstrates innovative potential in formative assessment, with its dynamic feedback characteristics aligned closely with the teaching objectives. However, the application of summative assessments should be approached with caution, and the correlation between emotional data and academic performance requires further clarification.

5.2. Limitation and Prospective Research

This study suggests that the development of an emotion dictionary for the learning domain, together with the proposed hybrid SnowNLP+Jieba model, is effective for further improving the correct discrimination of learning emotions. By identifying the keywords that affect learning emotions, the researcher can design a more appropriate curriculum to enhance positive emotions, which in turn will increase engagement and improve learning effectiveness.

Future studies may combine AI-based sentiment analysis with data from additional measures, such as physiological sensors or computer vision techniques, to provide a more holistic view of student engagement. Conducting longitudinal studies to assess the long-term impact of the AI-based sentiment analysis system on learning engagement and outcomes would offer more comprehensive insights. This study employs a binary classification approach to analyze learning engagement. Future research can utilize other deep learning models—such as Convolutional Neural Networks (CNN), Long Short-Term Memory networks (LSTM), or large language models (LLMs)—to explore the relationships among text sentiment scores, learning outcomes, and levels of learning engagement, as well as to perform prediction or classification.

In addition, teaching styles and methods can impact the effectiveness of an AI-based sentiment analysis system in STEAM education by shaping the quantity, quality, and emotional texture of the text data. Inquiry-based, project-based, and hands-on approaches tend to generate richer, more actionable data, thereby boosting the AI’s ability to measure engagement accurately. However, lecture-based or less interactive methods might challenge the system with sparse or ambiguous inputs. For optimal performance, the AI must be trained on diverse, context-aware datasets tailored to STEAM’s unique blend of creativity and problem-solving, and it should adapt to the nuances that each teaching approach brings to student expression and engagement.

Author Contributions

Conceptualization, C.-H.W. and K.-L.P.; Methodology, C.-H.W.; Investigation, C.-H.W. and K.-L.P.; Resources, C.-H.W.; Data curation, C.-H.W.; Writing—original draft, C.-H.W.; Writing—review & editing, K.-L.P.; Visualization, C.-H.W. and K.-L.P.; Supervision, K.-L.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Ministry of Science and Technology Council (Taiwan) under Grant No. MOST 110-2511-H-142-008-MY2 and NSTC 113-2410-H-003-144-MY3; and City University of Macau.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original data presented in the study are openly available in Sweslo17 at [39] and [ppzhenghua] at [41].

Conflicts of Interest

The authors declare no conflict of interest.

References

Conklin, J. A Taxonomy for Learning, Teaching, and Assessing: A Revision of Bloom’s Taxonomy of Educational Objectives Complete Edition; Springer: Berlin/Heidelberg, Germany, 2005. [Google Scholar]
Fan, S.-C.; Yu, K.-C.; Lin, K.-Y. A Framework for Implementing an Engineering-Focused STEM Curriculum. Int. J. Sci. Math. Educ. 2020, 19, 1523–1524. [Google Scholar] [CrossRef]
Lin, K.-Y.; Yu, K.-C.; Hsiao, H.-S.; Chang, Y.-S.; Chien, Y.-H. Effects of web-based versus classroom-based STEM learning environments on the development of collaborative problem-solving skills in junior high school students. Int. J. Technol. Des. Educ. 2020, 30, 21–34. [Google Scholar] [CrossRef]
Wu, C.-H.; Liu, C.-H.; Huang, Y.-M. The exploration of continuous learning intention in STEAM education through attitude, motivation, and cognitive load. Int. J. STEM Educ. 2022, 9, 35. [Google Scholar] [CrossRef]
Kuo, H.-C.; Tseng, Y.-C.; Yang, Y.-T.C. Promoting college student’s learning motivation and creativity through a STEM interdisciplinary PBL human-computer interaction system design and development course. Think. Ski. Creat. 2019, 31, 1–10. [Google Scholar] [CrossRef]
Kazimoglu, C.; Kiernan, M.; Bacon, L.; MacKinnon, L. Learning Programming at the Computational Thinking Level via Digital Game-Play. Procedia Comput. Sci. 2012, 9, 522–531. [Google Scholar] [CrossRef]
Eltanahy, M.; Forawi, S. Science Teachers’ and Students’ Perceptions of the Implementation of Inquiry-Based Learning Instruction in a Middle School in Dubai. J. Educ. 2019, 199, 13–23. [Google Scholar] [CrossRef]
Shaikh, M.A.M.; Prendinger, H.; Ishizuka, M. Sentiment assessment of text by analyzing linguistic features and contextual valence assignment. Appl. Artif. Intell. 2008, 22, 558–601. [Google Scholar] [CrossRef]
Iglesias, C.A.; Moreno, A. Sentiment Analysis for Social Media. Appl. Sci. 2019, 9, 5037. [Google Scholar] [CrossRef]
Carosia, A.E.O.; Coelho, G.P.; Silva, A.E.A. Analyzing the Brazilian Financial Market through Portuguese Sentiment Analysis in Social Media. Appl. Artif. Intell. 2020, 34, 1–19. [Google Scholar] [CrossRef]
Imtiaz, M.N.; Ben Islam, M.K. Identifying Significance of Product Features on Customer Satisfaction Recognizing Public Sentiment Polarity: Analysis of Smart Phone Industry Using Machine-Learning Approaches. Appl. Artif. Intell. 2020, 34, 832–848. [Google Scholar] [CrossRef]
Avtalion, Z.; Aviv, I.; Hadar, I.; Luria, G.; Bar-Gil, O. Digital Infrastructure as a New Organizational Digital Climate Dimension. Appl. Sci. 2024, 14, 8592. [Google Scholar] [CrossRef]
Hmelo-Silver, C.E. Problem-based learning: What and how do students learn? Educ. Psychol. Rev. 2004, 16, 235–266. [Google Scholar] [CrossRef]
Lin, K.-Y.; Hsiao, H.-S.; Williams, P.J.; Chen, Y.-H. Effects of 6E-oriented STEM practical activities in cultivating middle school students’ attitudes toward technology and technological inquiry ability. Res. Sci. Technol. Educ. 2020, 38, 1–18. [Google Scholar] [CrossRef]
Marcus, M.; Haden, C.A.; Uttal, D.H. Promoting children’s learning and transfer across informal science, technology, engineering, and mathematics learning experiences. J. Exp. Child Psychol. 2018, 175, 80–95. [Google Scholar] [CrossRef] [PubMed]
Thuneberg, H.; Salmi, H.; Bogner, F.X. How creativity, autonomy and visual reasoning contribute to cognitive learning in a STEAM hands-on inquiry-based math module. Think. Ski. Creat. 2018, 29, 153–160. [Google Scholar] [CrossRef]
Mutambara, D.; Bayaga, A. Determinants of mobile learning acceptance for STEM education in rural areas. Comput. Educ. 2021, 160, 104010. [Google Scholar] [CrossRef]
Azkarai, A.; Kopinska, M. Young EFL learners and collaborative writing: A study on patterns of interaction, engagement in LREs, and task motivation. System 2020, 94, 102338. [Google Scholar] [CrossRef]
Baker, A.R.; Lin, T.-J.; Chen, J.; Paul, N.; Anderson, R.C.; Nguyen-Jahiel, K. Effects of teacher framing on student engagement during collaborative reasoning discussions. Contemp. Educ. Psychol. 2017, 51, 253–266. [Google Scholar] [CrossRef]
Zhang, X.; Meng, Y.; Ordóñez de Pablos, P.; Sun, Y. Learning analytics in collaborative learning supported by Slack: From the perspective of engagement. Comput. Hum. Behav. 2019, 92, 625–633. [Google Scholar] [CrossRef]
Fredricks, J.A.; Blumenfeld, P.C.; Paris, A.H. School engagement: Potential of the concept, state of the evidence. Rev. Educ. Res. 2004, 74, 59–109. [Google Scholar] [CrossRef]
Luo, F.; Antonenko, P.D.; Davis, E.C. Exploring the evolution of two girls’ conceptions and practices in computational thinking in science. Comput. Educ. 2020, 146, 103759. [Google Scholar] [CrossRef]
Bathgate, M.; Schunn, C. Factors that deepen or attenuate decline of science utility value during the middle school years. Contemp. Educ. Psychol. 2017, 49, 215–225. [Google Scholar] [CrossRef]
Dixson, M.D. Creating effective student engagement in online courses: What do students find engaging? J. Scholarsh. Teach. Learn. 2010, 10, 1. [Google Scholar]
D’Mello, S.; Olney, A.; Williams, C.; Hays, P. Gaze tutor: A gaze-reactive intelligent tutoring system. Int. J. Hum.-Comput. Stud. 2012, 70, 377–398. [Google Scholar] [CrossRef]
Mota, S.; Picard, R.W. Automated posture analysis for detecting learner’s interest level. In Proceedings of the 2003 Conference on Computer Vision and Pattern Recognition Workshop, Madison, MI, USA, 16–22 June 2003; p. 49. [Google Scholar]
Milosz, M.; Plechawska-Wójcik, M.; Dzieńkowski, M. Testing the Quality of the Mobile Application Interface Using Various Methods—A Case Study of the T1DCoach Application. Appl. Sci. 2024, 14, 6583. [Google Scholar] [CrossRef]
Lee, V.R.; Fischback, L.; Cain, R. A wearables-based approach to detect and identify momentary engagement in afterschool Makerspace programs. Contemp. Educ. Psychol. 2019, 59, 101789. [Google Scholar] [CrossRef]
Li, Z.; Zhan, Z. Integrated infrared imaging techniques and multi-model information via convolution neural network for learning engagement evaluation. Infrared Phys. Technol. 2020, 109, 103430. [Google Scholar] [CrossRef]
Li, S.; Lajoie, S.P.; Zheng, J.; Wu, H.; Cheng, H. Automated Detection of Cognitive Engagement to Inform the Art of Staying Engaged in Problem-solving. Comput. Educ. 2020, 59, 104114. [Google Scholar] [CrossRef]
Thomas, C.; Jayagopi, D.B. Predicting student engagement in classrooms using facial behavioral cues. In Proceedings of the 1st ACM SIGCHI International Workshop on Multimodal Interaction for Education, Glasgow, UK, 13 November 2017; pp. 33–40. [Google Scholar]
Xie, G.; Liu, N.; Hu, X.; Shen, Y. Toward Prompt-Enhanced Sentiment Analysis with Mutual Describable Information Between Aspects. Appl. Artif. Intell. 2023, 37, 2186432. [Google Scholar] [CrossRef]
Trisna, K.W.; Jie, H.J. Deep Learning Approach for Aspect-Based Sentiment Classification: A Comparative Review. Appl. Artif. Intell. 2022, 36, 2014186. [Google Scholar] [CrossRef]
Heflin, H.; Shewmaker, J.; Nguyen, J. Impact of mobile technology on student attitudes, engagement, and learning. Comput. Educ. 2017, 107, 91–99. [Google Scholar] [CrossRef]
Cole, P.G.; Chan, L.K.S. Teaching Principles and Practice; Prentice Hall: New York, NY, USA, 1994. [Google Scholar]
Lane, E.S.; Harris, S.E. A new tool for measuring student behavioral engagement in large university classes. J. Coll. Sci. Teach. 2015, 44, 83–91. [Google Scholar] [CrossRef]
Ganesan, K. AI Implementation, Hands-On NLP, Text Mining. 2020. Available online: https://kavita-ganesan.com/extracting-keywords-from-text-tfidf/#.X-Xrqluzayp5 (accessed on 1 January 2025).
EternalFeather. Chinese-Sentiment-Lexicon. Github 2017. Available online: https://github.com/EternalFeather/Chinese-Sentiment-Lexicon (accessed on 1 January 2025).
Sweslo17. Chinese_Sentiment. Github 2015. Available online: https://github.com/sweslo17/chinese_sentiment/tree/master/dict (accessed on 1 January 2025).
Chen, W.-F.; Ku, L.-W. Introduction to CSentiPackage: Tools for Chinese Sentiment Analysis. J. Libr. Inf. Sci. 2018, 44, 24–41. (In Chinese) [Google Scholar] [CrossRef]
Ppzhenghua. Sentiment Analysis Dictionary. Github 2022. Available online: https://github.com/ppzhenghua/SentimentAnalysisDictionary (accessed on 1 January 2025).

Figure 1. Bloom’s taxonomy of learning [1].

Figure 3. Research and analysis process.

Figure 4. Model accuracy comparison chart.

Figure 5. Positive emotions word cloud.

Figure 6. Negative emotions word cloud.

Table 1. Learning sheet design.

Question: Enter n and print out the corresponding star.
Operational Thinking	Classmate’s Answer	Reference to the answer provided by the teacher
Breaking down the problem Break down a big problem into several small problems		Input: Get input nProcessing.Output: Print blank, print star
Pattern Recognition Identify places (commands) that are repeatedly executed		Repeatedly.Blank printing, Single line star printing
Abstraction Write complex calculations as subroutines, find available functions		Available functions: print, inputSubroutines: print_space, print_star, print a single line
Algorithm Use all available subroutines or functions to map out the steps and flow of program execution.		Flow Chart
Q1. Please explain why each step is designed this way. Question Breakdown: Pattern recognition: Abstraction: Algorithm: Subroutine values
Q2. Please explain if this exercise is helpful to you. Why?
Q3. Does the loop value you planned on paper work correctly (Yes/No)? If not, what do you think is the reason?

Table 3. Comparison of the accuracy rates of different dictionary models.

Models		(1) Dictionary #1 NTUSU Dict.	(2) Dictionary #2 Sentiment Dict.	(1)–(2) Improvement
Original model		Accuracy	Accuracy	Accuracy
Model A	snowNLP	40.31%	40.31%
Model B	jieba	87.60%	87.60%
Model C	nltk	78.29%	78.29%
Our proposed revised model
Model D	snowNLP + training	89.15%	65.89%	+23.26%
* Model E	snowNLP + jieba	95.35%	87.60%	+7.75%

* denotes best model.

Table 4. Comparison of the accuracy rates of different models.

Models
Original Model		Accuracy	Improved Accuracy
Model A	snowNLP	40.31%
Model B	jieba	87.60%
Model C	nltk	78.29%
Our proposed revised model			(E)-(A)	(E)-(B)	(E)-(D)
Model D	snowNLP + training	89.15%
* Model E	snowNLP + jieba	95.35%	+55.04%	+7.75%	+6.20%

* denotes best model.

Table 5. Keyword analysis.

Positive		Negative
Keywords	Count	Keywords	Count
1. Reflections	18	1. Question	30
2. Logic	17	2. Error	18
3. Easy	16	3. Fault	10
4. Actual	13	4. Mistake	10
5. Interesting	11

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wu, C.-H.; Peng, K.-L. Development of an Artificial Intelligence-Based Text Sentiment Analysis System for Evaluating Learning Engagement Levels in STEAM Education. Appl. Sci. 2025, 15, 4304. https://doi.org/10.3390/app15084304

AMA Style

Wu C-H, Peng K-L. Development of an Artificial Intelligence-Based Text Sentiment Analysis System for Evaluating Learning Engagement Levels in STEAM Education. Applied Sciences. 2025; 15(8):4304. https://doi.org/10.3390/app15084304

Chicago/Turabian Style

Wu, Chih-Hung, and Kang-Lin Peng. 2025. "Development of an Artificial Intelligence-Based Text Sentiment Analysis System for Evaluating Learning Engagement Levels in STEAM Education" Applied Sciences 15, no. 8: 4304. https://doi.org/10.3390/app15084304

APA Style

Wu, C.-H., & Peng, K.-L. (2025). Development of an Artificial Intelligence-Based Text Sentiment Analysis System for Evaluating Learning Engagement Levels in STEAM Education. Applied Sciences, 15(8), 4304. https://doi.org/10.3390/app15084304

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Development of an Artificial Intelligence-Based Text Sentiment Analysis System for Evaluating Learning Engagement Levels in STEAM Education

Abstract

1. Introduction

2. Literature Review

2.1. Learning Engagement

2.2. Engagement Assessment

2.3. Our Contributions

3. Research Methodology

3.1. Data Collection Method

3.2. Computational Thinking + Inquiry Learning #1 (Python Fundamentals) Learning Sheet Training

3.3. Participation Recognition System for AI Voice Recognition/Text Emotion Recognition System for Creative Education

3.4. Keyword Extraction

3.5. Emotional Engagement Level Calculation

3.6. Research and Analysis Process

4. Results and Analysis

4.1. Effect of Sentiment Dictionaries on Model Accuracy

4.2. Analysis of the Accuracy of the Different Models

4.3. Keyword Analysis and Word Cloud Analysis

4.4. Discussions

5. Conclusions

5.1. Practical Implications

5.2. Limitation and Prospective Research

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI