1. Introduction
The advancement of artificial intelligence (AI) has affected all walks of life, and education is no exception. The Ministry of Education’s “Compulsory Education Mathematics Curriculum Standards (2022 Edition)” pointed out the following: “Attach importance to the role of big data, artificial intelligence, etc. in promoting mathematics teaching reform, improve teaching methods, and promote changes in students’ learning methods.” The advent of large language models (LLMs) has enhanced enthusiasm, and more people have paid attention to the application of LLMs in the field of education.
LLMs refer to machine learning models with a large number of parameters and complex computational structures, typically constructed on deep neural networks. These models are trained on vast amounts of data with billions of parameters, enabling them to generalize and make accurate predictions on unseen data, as they are capable of handling complex tasks. Based on the type of input data, LLMs are classified into language models, vision models, and multimodal models. In education, current attention is mainly paid to LLMs, such as bidirectional encoder representations from transformers (BERTs), text-to-text transfer transformer (T5), and the generative pre-trained transformer (GPT) series [
1], as well as enhanced representation through knowledge integration (ERNIE Bot), among which ChatGPT has garnered the most attention.
ChatGPT is an AI chatbot launched by OpenAI in November 2022. It analyzes a large amount of text data and learns a probabilistic model to predict the next possible word based on the words and sentences that appeared previously. Its powerful natural language processing and context understanding capabilities give it broad application prospects in various fields. Cui et al. [
2] believed that GPT achieves teaching content optimization, teaching process guidance assistance, teaching method optimization, academic paper writing, and teaching and learning effect evaluation. GPT helps improve teaching quality and efficiency, and can also be used as a teaching auxiliary tool, answer professional academic questions, build an independent learning platform, save human resource costs, and reconstruct the school education structure [
3]. Zhang et al. [
4] used ChatGPT to create intelligent geography teaching scenarios in middle school geography, promote innovation in geography learning methods, and reshape geography teaching evaluation methods. Guo et al. [
5] took the data structure course teaching as an example and explored how to use the large language model ERNIE Bot proposed by Baidu in university teaching. Wang et al. [
6] applied AI dialogue robots represented by Baidu ERNIE Bot or ChatGPT in programming teaching.
Li et al. [
7] found that ChatGPT can be used to design lesson plans for teachers and propose practices for lesson preparation in junior high school mathematics. However, it cannot simulate real teaching and propose educational objects and educational behavior. ChatGPT is a part of an excellent teaching auxiliary tool. ChatGPT has great potential in improving education, solving mathematical problems, and student learning [
8]. For example, it helps teachers and educators generate personalized and relevant educational content for students to enhance student participation, enthusiasm, and academic performance. The ChatGPT model is a valuable tool for educational assessment and evaluation. Teachers can use it to evaluate students’ homework and provide feedback. Shakarian et al. [
9] studied the performance of mathematical word problems (MWPs) on LLMs and found that the performance of ChatGPT changes depending on the requirements, and the probability of failure increases linearly with the number of addition and subtraction operations.
In response to math problems, Xueersi proposed the Jiuzhang Large Model (MathGPT). It was independently developed by Tomorrow Advancing Life (TAL) and provides a large model for math problem-solving and problem-explaining algorithms for global users and research institutions. It is also the first large model in China built specifically for math. MathGPT’s capabilities cover math problems in elementary, middle, and high schools and calculation questions, application questions, and algebraic questions. However, question-and-answer interactions other than math are not yet available.
2. LLMs in Middle School Mathematics Teaching
Whether it is ERNIE Bot, ChatGPT, or MathGPT, the answers given by LLMs are based on a probability model, which might not be accurate. Even the two answers to the same question might be different, which causes problems in mathematics, a subject that emphasizes accuracy and clarity. The main user group of LLMs is teachers who understand mathematical knowledge, rather than students who lack it. This article mainly analyzes the application of LLMs in the teacher group. The most obvious change is the transformation in teaching methods. LLMs can be used by teachers, but students must use them with caution.
2.1. Traditional Teaching Method
Before the emergence of large language models (LLMs), mathematics instruction took place through face-to-face interactions between teachers and students, with classroom teaching as the main method and online learning serving as a supplementary tool. Traditional classroom teaching is a teacher-centered lecture-based approach, with students passively receiving the learning. This teaching model has been used for a long time and is favored by many teachers. It allows for rapid and efficient knowledge delivery, enabling teachers to complete the required teaching tasks within a specified time. Additionally, it enables immediate feedback from students, allowing the teacher to assess students’ understanding of the mathematical concepts and adjust the teaching process accordingly. Traditional mathematics teaching methods often fail to inspire students’ enthusiasm for learning, causing them to dislike mathematics. In addition, traditional teaching models cannot accommodate the different learning styles and paces of each student. The classroom teaching model adopts an indoctrination teaching method, and students lack independent activities and a spirit of inquiry.
2.2. LLMs in Supporting Teaching
LLMs serve as a teaching assistant for teachers and a tool to enhance the learning experience for students. They provide teachers with rich lesson planning ideas and materials to refine the design of teaching activities. This, in turn, stimulates students’ curiosity for exploration and discovery, fostering their active learning. LLMs generate interesting mathematical problems and challenges, encouraging students to engage their creative thinking and problem-solving abilities. LLMs tailor teaching to different student abilities, mitigating issues from “one size fits all” and enabling the teaching of students according to their aptitude. Current education must be student-centered. Each student has his or her learning style and pace. LLMs use an amount of data for training to be “well-informed” and provide personalized guidance plans based on the different situations of students. With the help of LLMs, teachers personalize teaching more easily, greatly reducing the difficulty of teaching students with different aptitudes.
2.3. Integrating Programming Languages to Enrich Teaching Methods
Jiang et al. [
10] combined their confusion and experience in middle school mathematics teaching and pointed out that ChatGPT, as a language model, cannot combine numbers and shapes. This problem can be solved by combining the large model with a programming language to develop a teaching method that combines numbers and shapes. Programming languages are also applicable to LLMs for graphical plotting. In the past, due to the lack of specialized programming knowledge, middle school mathematics teachers rarely used programming languages in teaching. LLMs such as ERNIE Bot and ChatGPT remove the barrier to using these programming languages. Nowadays, graphical tasks are easily completed by running the code provided by LLMs, leading to successful results. Python 3.10 is an example of using ERNIE Bot and ChatGPT to draw a parabola with parameters interactively set by the user.
In the experiment, we requested ERNIE Bot 4.5 and ChatGPT-3.5 to create a segment of code from each of them. The response from ERNIE Bot is presented on the left of
Figure 1, and the response from ChatGPT-3.5 is on the right.
Without any modifications, we ran the created code. Here, we used Jupyter Notebook 7 with Anaconda for running the code. The result from ChatGPT3.5 is presented in
Figure 2.
The result of ERNIE Bot is presented in
Figure 3 (colored boxes are not obtained by running the code).
ERNIE Bot produced better results in the experiment. The parabola was adjusted by dragging the three sliders in the blue box area, and the parameters in the green box were modified to adjust the parabola.
3. Challenges in Using LLMs
3.1. Insufficient Accuracy in Problem-Solving
Due to the particularities of the mathematics discipline, which emphasizes accuracy and clarity, the accuracy of problem-solving is crucial when LLMs are used as intelligent tutors for students in math education. To illustrate this, an example was explored in this study. Using the data provided by the official MathGPT website, we compared the outcomes of MathGPT and ChatGPT on the open-source datasets TAL-SCQ5K-CN and TAL-SCQ5K-EN, which consists of single-choice math questions from elementary, middle, and high school levels, along with detailed solution steps with 3K training sets and 2K test sets (
Table 1).
MathGPT’s results were better than ChatGPT4, while ChatGPT4 was appropriate for solving English problems. MathGPT was better at solving Chinese problems. However, the accuracy of the two was similar. LLMs are likely to provide wrong answers, which hinders them from playing a guiding role, with a negative impact on students and misleading students’ cognition. Mathematics teachers need to participate in students’ personalized learning as supervisors who cannot judge whether the knowledge is correct or not.
3.2. Lack of Logical Consistency
The logical reasoning of LLMs is inherently limited due to their characteristics. First, LLMs represented by ChatGPT are generative models based on a statistical model. The answers it generates are obtained by learning the statistical laws of a large amount of text data through training. The observed language patterns and statistical probabilities are the source of its generated answers. There is no direct logical reasoning process in the entire answering process. This leads to LLMs’ inevitable limitations when dealing with logical problems, which is obvious in complex reasoning and inference problems. In middle school mathematics, logical reasoning is important, which leads to the limitations of the use of LLMs and produces logical errors.
In addition, each training session requires a lot of manpower and material resources, with training data from the large-scale text corpus on the Internet. LLMs have no cognition of post-training results, which leads to potential errors in answers, and the corpus on the Internet used for dataset production is not correct. In contrast, programming languages produce better results due to their strict characteristics.
3.3. Dependence
Excessive use of LLMs makes people dependent on them, and excessive dependence causes people to lose their basic judgment and creativity. Due to the convenience brought by LLMs, people can be naïve to a large extent. While pursuing the convenience brought by LLMs and improving efficiency, people do not tend to think seriously. Inactive thinking becomes an obstacle to the improvement in teachers’ abilities. LLMs are auxiliary tools, and the improvement in teaching quality still depends mainly on the judgment and wisdom of teachers.