Developing a Task-Based Dialogue System for English Language Learning

This research involved the design of a task-based dialogue system and evaluation of its learning effectiveness. Dialogue training still heavily depends on human communication with instant feedback or correction. However, it is not possible to provide a personal tutor for every English learner. With the rapid development of information technology, digitized learning and voice communication is a possible solution. The goal of this research was to develop an innovative model to refine the task-based dialogue system, including natural language understanding, disassembly intention, and dialogue state tracking. To enable the dialogue system to find the corresponding sentence accurately, the dialogue system was designed with machine learning algorithms to allow users to communicate in a task-based fashion. Past research has pointed out that computer-assisted instruction has achieved remarkable results in language reading, writing, and listening. Therefore, the direction of the discussion is to use the task-oriented dialogue system as a speaking teaching assistant. To train the speaking ability, the proposed system provides a simulation environment with goal-oriented characteristics, allowing learners to continuously improve their language fluency in terms of speaking ability by simulating conversational situational exercises. To evaluate the possibility of replacing the traditional English speaking practice with the proposed system, a small English speaking class experiment was carried out to validate the effectiveness of the proposed system. Data of 28 students with three assigned tasks were collected and analyzed. The promising results of the collected students’ feedback confirm the positive perceptions toward the system regarding user interface, learning style, and the system’s effectiveness.


Introduction
With the rise of Internet technology, many educational institutions are gradually turning their attention to the application of digital education. Various online learning methods enable people to make good use of their spare time to learn, greatly enhancing traditional learning efficiency. Technologies integrating various learning approaches such as pragmatic, context, or cooperated learning have shown great success in language learning [1][2][3]. In the context of digital language learning, speaking is considered to be one of the most important parts of learning a foreign language [4]. To address this problem, social interaction is a key factor in improving language fluency in language learning. However, the cost of creating a social language-learning environment is too high to be widely implemented [5,6]. Researchers seek opportunities to adopt computer-aided technologies to create an environment similar to speaking with native English speakers. With the 1.
Information-gap activity: Allow learners to exchange information to fill up the information gap.
Learners can communicate with each other using the target language to ask questions and solve problems.

2.
Opinion-gap activity: Learners express their feelings, ideas, and personal preferences to complete the task. In addition to interacting with each other, teachers can add personal tasks to the theme to stimulate the learners' potential.

3.
Reasoning-gap activity: Students conclude new information through reasoning by using the existing information, for example, deriving the meaning of the dialogue topic or the implied association in the sentence from the dialogue process.
Willis outlined the teaching process of task-based language teaching as three stages: pre-task stage, task cycle stage, and language focus stage [18,19]. Stage activities can be used to construct a complete language learning process. The pre-task stage pre-approves the learner's task instructions and provide the student with clear instructions on what must be done during the task phase [19]. This helps students review the language skills associated with the task. Through the execution of task activities, the teacher can judge the students' learning status on the topic. At the task cycle stage, students use the words and grammar they learned during the task preparation phase and think about how to complete the tasks and presentations. In this process, the teacher plays the role of supervisor, giving appropriate guidance and language-related resources. In the last stage, the language focus stage, students and teachers review related issues encountered during the previous phase, such as the use of words, grammar, or sentence structure. The teacher guides the students to practice the analyzed results and improve their language comprehension.
The efficiency and crucial factors of task-based language learning have been surveyed by different aspects of studies. Research shows a significant improvement of speaking comprehension [20][21][22]. Rabbanifar and Mall-Amiri indicate that the reasoning-gap activity holds the key factor for speaking complexity and accuracy [21].
The present study adopted the three-stage-model shown in Figure 1 to develop the task-based dialogue system [16]. In the pre-task stage, the system needs to present the task and let students clearly understand the goals to accomplish throughout the conversation. In the task cycle, the system needs to interact with students and guide students to finish the task. For the language focus stage, the system needs to be able to evaluate the performance of the students and give the proper feedback. Educ The task-based dialogue system usually has a very clear task, such as helping users order meals or learning languages [23]. This dialogue robot contains basic modules including Dialogue Script, Dialogue Manager, Natural Language Understanding, and Natural Language Generation. As shown in Figure 2 [24], the widely used method of the task-based dialogue system is to treat the dialogue response as a pipeline. The system must first understand the information conveyed by humans and identify it as an internal system. According to the state of the conversation, the system generates the corresponding reply behavior and finally converts these actions into the expression of natural language. Although this language understanding is usually handled by statistical models, most of the established dialogue systems still use manual features or manually defined rules for identifying state and action representations, semantic detection, and problem filling [24]. Task-based dialogue system. Implementing a dialogue system for language learning has been carried out by using different algorithms years ago [25][26][27]. From the statistic model to pattern recognition, the applications have become more practical and widely developed with the advancement of text mining and natural language processing technologies [25]. Several advantages have been addressed using dialogue system for language learning. The language-learning dialogue system is considered fun and easy to approach for students [25,26]. In addition, the dialogue system is easily integrated with teaching methods such as grammar check and repetition [25]. Except when carrying out the task, the The task-based dialogue system usually has a very clear task, such as helping users order meals or learning languages [23]. This dialogue robot contains basic modules including Dialogue Script, Dialogue Manager, Natural Language Understanding, and Natural Language Generation. As shown in Figure 2 [24], the widely used method of the task-based dialogue system is to treat the dialogue response as a pipeline. The system must first understand the information conveyed by humans and identify it as an internal system. According to the state of the conversation, the system generates the corresponding reply behavior and finally converts these actions into the expression of natural language. Although this language understanding is usually handled by statistical models, most of the established dialogue systems still use manual features or manually defined rules for identifying state and action representations, semantic detection, and problem filling [24]. The task-based dialogue system usually has a very clear task, such as helping users order meals or learning languages [23]. This dialogue robot contains basic modules including Dialogue Script, Dialogue Manager, Natural Language Understanding, and Natural Language Generation. As shown in Figure 2 [24], the widely used method of the task-based dialogue system is to treat the dialogue response as a pipeline. The system must first understand the information conveyed by humans and identify it as an internal system. According to the state of the conversation, the system generates the corresponding reply behavior and finally converts these actions into the expression of natural language. Although this language understanding is usually handled by statistical models, most of the established dialogue systems still use manual features or manually defined rules for identifying state and action representations, semantic detection, and problem filling [24]. Task-based dialogue system. Implementing a dialogue system for language learning has been carried out by using different algorithms years ago [25][26][27]. From the statistic model to pattern recognition, the applications have become more practical and widely developed with the advancement of text mining and natural language processing technologies [25]. Several advantages have been addressed using dialogue system for language learning. The language-learning dialogue system is considered fun and easy to approach for students [25,26]. In addition, the dialogue system is easily integrated with teaching methods such as grammar check and repetition [25]. Except when carrying out the task, the Task-based dialogue system. Implementing a dialogue system for language learning has been carried out by using different algorithms years ago [25][26][27]. From the statistic model to pattern recognition, the applications have become more practical and widely developed with the advancement of text mining and natural language processing technologies [25]. Several advantages have been addressed using dialogue system for language learning. The language-learning dialogue system is considered fun and easy to approach for students [25,26]. In addition, the dialogue system is easily integrated with teaching methods such as grammar check and repetition [25]. Except when carrying out the task, the proposed dialogue Educ. Sci. 2020, 10, 0306 4 of 20 system needs to focus more on language learning. Functions regarding speaking comprehension need to be considered and developed in the system.
In recent years, hardware and software technologies have grown rapidly. The media attention toward artificial intelligence and machine learning continues to rise. The development of these technologies makes it possible for applications using machine learning and human-computer interaction to process large amounts of data storage and massive calculations. Many researchers have turned to applications with natural language processing [28][29][30]. Natural language processing is the communication channel between human and machine. It is also one of the most difficult problems in computer science, whether it is to achieve natural language understanding or natural language interaction. However, applications of natural language processing have been proposed in different fields, such as machine translation, dialogue robots, data retrieval, and abstract generation. Among those applications, the task-oriented robot shows the capability of solving the special purpose problems. For example, food-ordering robots in restaurants or customer service robots are general applications using a task-oriented dialogue robot. In education, computer-assisted teaching robots can help learners' oral fluency and build self-confidence for speaking foreign languages.
The decision-making technology of the dialogue system (chatbot) has gradually matured, an example being the Siri artificial intelligence assistant software in Apple's iOS system [31,32]. Through natural language processing technology, people can use dialogue to smoothly interact with mobile devices, such as querying weather, making phone calls, and setting up to-do items [33][34][35]. The use of the dialogue system is quite extensive. In terms of the fast-growing chat bots in recent years, in order to allow customers to get instant response from enterprises, many companies have invested resources into building dedicated dialogue robots to save labor costs [34,35]. The chat bot is based on the dialogue system, so it is necessary to simulate human dialogue. In addition, the dialogue has to have meaningful purpose. It still remains a challenge for today's chat bots to understand all kinds of questions and responses correctly, since human languages are ambiguous to a degree. Dialogue training still heavily depends on human communication with instant feedback or correction [32,36]. However, it is not possible to provide a personal tutor for every English learner.
Therefore, this study involved the development of a task-based dialogue system that combines task-based language teaching with a dialogue robot. The proposed task-based dialogue system contains functions to carry out the conversational task including natural language understanding, disassembly intention, and dialogue state tracking. The research objectives were as follows: 1.
Development of a task-based dialogue system that is able to conduct a task-oriented conversation and evaluate students' performance after the conversation; 2.
Comparison of the differences between the proposed system and the traditional methods; 3.
Evaluation of the effectiveness of the proposed system.
The first step of this study was to survey the related studies on task-based learning methodology and task-based dialogue systems to establish the fundamental curriculum design and interfaces of the system. Section 2 proposes a novel framework of a task-based dialogue-learning model. Section 3 elaborates on the experiment and the results. Finally, Section 4 concludes the results and discusses limitations and future works.

Proposed Task-Based Dialogue-Learning Model
This study involved the development of a dialogue system that combines task-based teaching methods to assist teachers in guiding students to complete dialogue tasks with the dialogue robot. A complete set of dialogue scripts used by teachers was constructed. Scoring criteria for the grammar, sentences, and speaking were then established similar to those used by a regular English teacher. To validate the performance of the proposed model, an experimental evaluation was designed to explore the learning style and the learning status compared to traditional teaching methods.
Educ. Sci. 2020, 10, 0306 5 of 20 The dialogue system is composed of multiple modules as shown in Figure 3. In a task-based dialogue system, the dialogue is retrieved from the automatic speech recognition (ASR), and the information is recorded by the dialogue manager. The information is forwarded to the natural language understanding module to process and understand the semantics expressed by the learner in the conversation. The extracted result is converted into a semantic vector and compared with the pre-constructed dialogue script set. The statement decision module outputs the corresponding response based on the dialogue policy. Finally, the natural language generation module converts the semantic vector to corresponding dialogue. The dialogue can be delivered by a text-to-speech (TTS) module. Thus, multi-turn dialogue can be implemented so that the system can continuously correct or guide the learner back to the scope of the script collection. The system is also equipped with an exception handling function for instances where the conversation is not clear or falling off track. The task-based dialogue system not only needs to use natural language processing to understand sentences but also needs to give a reasonable response according to the current state like a real person.
Educ. Sci. 2020, 10, x FOR PEER REVIEW 5 of 19 The dialogue system is composed of multiple modules as shown in Figure 3. In a task-based dialogue system, the dialogue is retrieved from the automatic speech recognition (ASR), and the information is recorded by the dialogue manager. The information is forwarded to the natural language understanding module to process and understand the semantics expressed by the learner in the conversation. The extracted result is converted into a semantic vector and compared with the pre-constructed dialogue script set. The statement decision module outputs the corresponding response based on the dialogue policy. Finally, the natural language generation module converts the semantic vector to corresponding dialogue. The dialogue can be delivered by a text-to-speech (TTS) module. Thus, multi-turn dialogue can be implemented so that the system can continuously correct or guide the learner back to the scope of the script collection. The system is also equipped with an exception handling function for instances where the conversation is not clear or falling off track. The task-based dialogue system not only needs to use natural language processing to understand sentences but also needs to give a reasonable response according to the current state like a real person. Before the learner uses the system, the teacher conducts course training on the topic of the conversation and designs a tree-like dialogue script set in the system in advance. Since the task always involves decision-making, series of decisions for a particular task usually can be represented by a decision tree. Based on the decision tree, the dialogue script is also in a hierarchical form. The dialogue covers various topics and specific tasks such as ordering food or buying tickets. All the scripts were designed by professional English teachers. These conversational themes include the basic elements of language such as grammar statements, language skills, and culture integration. Each dialogue topic has a sequence of tasks to complete, which can be represented by a complete dialogue structure. Figure 4 shows an example of dialogue branches. In this example, there are three conversation rounds (N1, N2, N3) for one dialogue task, and each layer has one (inclusive) or more possible response sentences. The system determines the dialogue path based on the learner's answer. As shown in Figure 4, the dialogue presents the process of ordering food, whereas N represents non-player character and S represents student. Initially, the system starts the Q&A with the N1 layer. The system presents the learner with three possible responses in the S1 layer based on the dialogue script defined by the teacher. At this time, the learner interacts with the dialogue system to complete the task-based dialogue process by using the dialogue-topic-related information learned during the task preparation phase. This study was designed to enable learners to successfully complete conversation tasks and to guide learners to stay with the pre-defined script.
Note that the N3 block in the N2 layer is designed to be the continuation of the third answer selected by the learner at the S1 layer; that is, the conversation jumps to the content of the N3 layer so that the task-based dialogue can be completed. The system can flexibly convert or jump back to a conversation. When the learner is led away from the topic, the system can moderately guide the Before the learner uses the system, the teacher conducts course training on the topic of the conversation and designs a tree-like dialogue script set in the system in advance. Since the task always involves decision-making, series of decisions for a particular task usually can be represented by a decision tree. Based on the decision tree, the dialogue script is also in a hierarchical form. The dialogue covers various topics and specific tasks such as ordering food or buying tickets. All the scripts were designed by professional English teachers. These conversational themes include the basic elements of language such as grammar statements, language skills, and culture integration. Each dialogue topic has a sequence of tasks to complete, which can be represented by a complete dialogue structure. Figure 4 shows an example of dialogue branches. In this example, there are three conversation rounds (N1, N2, N3) for one dialogue task, and each layer has one (inclusive) or more possible response sentences. The system determines the dialogue path based on the learner's answer. As shown in Figure 4, the dialogue presents the process of ordering food, whereas N represents non-player character and S represents student. Initially, the system starts the Q&A with the N1 layer. The system presents the learner with three possible responses in the S1 layer based on the dialogue script defined by the teacher. At this time, the learner interacts with the dialogue system to complete the task-based dialogue process by using the dialogue-topic-related information learned during the task preparation phase. This study was designed to enable learners to successfully complete conversation tasks and to guide learners to stay with the pre-defined script.
Educ. Sci. 2020, 10, x FOR PEER REVIEW 6 of 19 conversation back to the topic. The learner can repeatedly practice and successfully complete the dialogue task and improve their English speaking ability. In order to train the dialogue robot for natural language understanding, the Wikipedia Corpus was used in this study [37]. Figure 5 shows the word2vector model, which represents the semantic meanings of the sentences and words based on the given Wikipedia Corpus data. A total of 14,000 articles were inputted and used to train the model. Table 1 shows the similarity test for two sentences based on the trained model. Cosine similarity is commonly used to measure the similarity of sentences or texts. Let s1 and s2 be two vectors to compare the similarity; the cosine similarity can be measured by Formula (1). For the sentences with similarity score 0.8 or above, the trained model is able to obtain the correct semantic meaning.
The proposed model adopts a sequence-to-sequence recurrent neural network as the dialogue generator. The sequence-to-sequence model consists of two recursive neural networks, the encoder and decoder, to simulate human thinking. When the machine receives a natural language sentence, it gives a corresponding reply according to what it understands. The encoder is responsible for digesting the input sequence and converting it into vector information containing the contents of the original sequence. The decoder generates text based on the converted vector so that it can process input and output sequences of variable length, such as inputting a question and outputting a reply. The designed dialogue robot thus can interact with the leaners based on the pre-defined scripts. Note that the N3 block in the N2 layer is designed to be the continuation of the third answer selected by the learner at the S1 layer; that is, the conversation jumps to the content of the N3 layer so that the task-based dialogue can be completed. The system can flexibly convert or jump back to a conversation. When the learner is led away from the topic, the system can moderately guide the conversation back to the topic. The learner can repeatedly practice and successfully complete the dialogue task and improve their English speaking ability.
In order to train the dialogue robot for natural language understanding, the Wikipedia Corpus was used in this study [37]. Figure 5 shows the word2vector model, which represents the semantic meanings of the sentences and words based on the given Wikipedia Corpus data. A total of 14,000 articles were inputted and used to train the model. Table 1 shows the similarity test for two sentences based on the trained model. Cosine similarity is commonly used to measure the similarity of sentences or texts. Let s1 and s2 be two vectors to compare the similarity; the cosine similarity can be measured by Formula (1). For the sentences with similarity score 0.8 or above, the trained model is able to obtain the correct semantic meaning.
The proposed model adopts a sequence-to-sequence recurrent neural network as the dialogue generator. The sequence-to-sequence model consists of two recursive neural networks, the encoder and decoder, to simulate human thinking. When the machine receives a natural language sentence, it gives a corresponding reply according to what it understands. The encoder is responsible for digesting the input sequence and converting it into vector information containing the contents of the original sequence. The decoder generates text based on the converted vector so that it can process input and output sequences of variable length, such as inputting a question and outputting a reply. The designed dialogue robot thus can interact with the leaners based on the pre-defined scripts. enhancement phase in the three-stage approach proposed by Willis (1996) to enhance the learner's language ability. When the learner selects the topic of the conversation, the dialogue scoring system acts as a teacher to evaluate the learner's conversation. The system grades learners' conversations in similar scoring mechanisms to professional English teachers such as timing, grammar, and correct responses.

Experiment Procedure and System
The experiment was conducted in a college-level English speaking class. Twenty-eight beginner-level students participated in this experiment with three different tasks after three weeks of traditional English teaching classes. Table 2 shows the given three tasks for the experiment. The teacher first designed three dialogue tasks based on the textbook and labeled them with the proper level to reflect the dialogue difficulty. The details of the three dialogue tasks are listed in Appendix A. According to three-stage task-based language learning, the teacher first explained the task and let students understand what needed to be done during the conversation. In the task cycle, students then entered the system and talked to the dialogue system to complete the task. During the process, the system interacted with the student and recorded the behaviors of the students including pause time, answer time, number of errors, the number of repetitions, and the number of hints (reminders). In the language focus stage, the system evaluated students' performance and gave the feedback to students.  The dialogue scoring system designed in this study records the sentences expressed by the learners. The system analyzes and evaluates the content of each statement and analyzes the learner's speaking comprehension. The results are presented to the teacher as a language enhancement phase in the three-stage approach proposed by Willis (1996) to enhance the learner's language ability. When the learner selects the topic of the conversation, the dialogue scoring system acts as a teacher to evaluate the learner's conversation. The system grades learners' conversations in similar scoring mechanisms to professional English teachers such as timing, grammar, and correct responses.

Experiment Procedure and System
The experiment was conducted in a college-level English speaking class. Twenty-eight beginner-level students participated in this experiment with three different tasks after three weeks of traditional English teaching classes. Table 2 shows the given three tasks for the experiment. The teacher first designed three dialogue tasks based on the textbook and labeled them with the proper level to reflect the dialogue difficulty. The details of the three dialogue tasks are listed in Appendix A. According to three-stage task-based language learning, the teacher first explained the task and let students understand what needed to be done during the conversation. In the task cycle, students then entered the system and talked to the dialogue system to complete the task. During the process, the system interacted with the student and recorded the behaviors of the students including pause time, answer time, number of errors, the number of repetitions, and the number of hints (reminders). In the language focus stage, the system evaluated students' performance and gave the feedback to students.  Figures 6 and 7 show the student interfaces of the proposed system. Student can receive the task cards given by the teacher and current progress of each task as shown in Figure 6. Once the student completes the task, the system gives the score right away and allows students to trace back the records of the corresponding task as shown in Figure 7. The student is able to replay or redo the task to improve the score of the task.   6 and 7 show the student interfaces of the proposed system. Student can receive the task cards given by the teacher and current progress of each task as shown in Figure 6. Once the student completes the task, the system gives the score right away and allows students to trace back the records of the corresponding task as shown in Figure 7. The student is able to replay or redo the task to improve the score of the task.   Figure 8 shows the teacher interface for editing the conversational tree. The tree structure conveys the possible paths for the assigned task. Based on the tree, the system can determine and guide the conversation accordingly based on students' dialogue. When students conduct the task, the system monitors students and helps them to complete the task by providing hints or repeating  Figure 8 shows the teacher interface for editing the conversational tree. The tree structure conveys the possible paths for the assigned task. Based on the tree, the system can determine and guide the conversation accordingly based on students' dialogue. When students conduct the task, the system monitors students and helps them to complete the task by providing hints or repeating the question.  Figure 8 shows the teacher interface for editing the conversational tree. The tree structure conveys the possible paths for the assigned task. Based on the tree, the system can determine and guide the conversation accordingly based on students' dialogue. When students conduct the task, the system monitors students and helps them to complete the task by providing hints or repeating the question.  Figure 9 shows the teacher interface for the task-based dialogue system. The teacher can create and manage the NPCs to interact with students. The system dashboard allows the teacher to monitor the progress of students.  Figure 9 shows the teacher interface for the task-based dialogue system. The teacher can create and manage the NPCs to interact with students. The system dashboard allows the teacher to monitor the progress of students.
Educ. Sci. 2020, 10, x FOR PEER REVIEW 10 of 19 Figure 9. Teacher editing interfaces. Figure 10 shows the learning status of students including the detail scores, time, and number of completions for each task. The score is given by the auto-scoring module in the system. Different scoring mechanisms can be selected, though the default scoring method is a rule-based point-deduction scoring method. The score is deducted by the number of wrong answers, number  Figure 10 shows the learning status of students including the detail scores, time, and number of completions for each task. The score is given by the auto-scoring module in the system. Different scoring mechanisms can be selected, though the default scoring method is a rule-based point-deduction scoring method. The score is deducted by the number of wrong answers, number of repetitions, and number of hints used throughout the task. Through this interface, the teacher can replay and manually evaluate the conversations. Figure 9. Teacher editing interfaces. Figure 10 shows the learning status of students including the detail scores, time, and number of completions for each task. The score is given by the auto-scoring module in the system. Different scoring mechanisms can be selected, though the default scoring method is a rule-based point-deduction scoring method. The score is deducted by the number of wrong answers, number of repetitions, and number of hints used throughout the task. Through this interface, the teacher can replay and manually evaluate the conversations.

Results
To address the research objectives, each conversation was recorded and evaluated by an English teacher. The score was compared to the different scoring mechanisms provided by the system to check the accuracy of the scoring system. In addition, a questionnaire was distributed after the experiment to evaluate the efficiency of the dialogue system.
The study collected a total of 636 records and 51 complete task data. Data for each complete task were evaluated using five scoring criteria by the task-based dialogue system and the same

Results
To address the research objectives, each conversation was recorded and evaluated by an English teacher. The score was compared to the different scoring mechanisms provided by the system to check the accuracy of the scoring system. In addition, a questionnaire was distributed after the experiment to evaluate the efficiency of the dialogue system.
The study collected a total of 636 records and 51 complete task data. Data for each complete task were evaluated using five scoring criteria by the task-based dialogue system and the same teacher who taught this class. The "correct" score refers to the score given by the teacher. The teacher was able to evaluate all the data recorded by the system while students were performing the tasks. The task dialogue can be re-produced based on the system records so that the teacher can evaluate the students' performance and give a score similar to that of face-to-face scoring criteria. Different criteria from the system were combined and tested to obtain accurate prediction. The criteria included pause time, answer time, number of errors, the number of repetitions, and the number of hints (reminders). Based on the suggestion from the teacher, the teacher judged students' performance based on the pause time after the question was asked. The number of incorrect responses also reflects the comprehension of the given dialogue. The number of repetitions and number of hints are also possible criteria suggested by the English teacher. The system recorded those criteria and used them to train a model to predict the "correct" score given by the teacher. Three different methods were tested in this experiment. The first method was a rule-based evaluation method. The rating rule was based on point-deduction rules given by the teachers. The number of errors, the number of repetitions, and the number of hints were considered for this method. Points were deducted whenever the rule is triggered. The deducted point for each rule was also suggested by the teacher. The second and the third methods used machine-learning algorithms to predict the scores. A multilayer feed-forward neural network was used to train and predict the score with different criteria as the input data and the final score as the output. The second method used neural network prediction taking the same criteria from the first method as input data, namely number of errors, the number of repetitions, and the number of hints. The third method was also a neural network approach considering all the five criteria recorded by the system, namely pause time, answer time, the number of errors, the number of repetitions, and the number of hints (reminders). The prediction models of the neural network methods were trained using the corresponding criteria and expected scores given by the teacher. The system uses the M5P algorithm to predict the nonlinear model. M5P is a machine-learning algorithm published by Yong Wang and Ian H. Witten in 1997 [38]. In practice, most of the prediction targets (classes) to be predicted by many machine learning research problems are continuous values, but only a few machine learning methods can handle continuous numerical predictions. M5P is one of the machine-learning algorithms that is able to predict the continuous value. Training involves 10-fold cross-validation. The 10-fold cross-validation is used to test the accuracy of the algorithm. The validation divides the data set into 10 parts, and takes turns using nine parts as training data and one part as test data for testing. Each test obtains the corresponding correct rate (or error rate). The average of the correct rate (or error rate) of the results of 10 repetitions is used as an estimate of the accuracy of the algorithm. Generally, it is necessary to perform multiple 10-fold cross-validation (for example, 10 times 10-fold cross-validation) and then find the average value as the algorithm accuracy rate. Based on 10-fold cross validation, 90% of the data were used as training data, and the remaining 10% were used as testing data. Figure 11 shows the predicted results of the different system methods. The X-axis shows the completed 51 tasks. The Y-axis shows the corresponding scores for each task given by three different automatic grading methods and manually by the classroom teacher. The detailed scores can be found in Appendix D. As shown in the figure, three different methods all gave an evaluation close to the teacher's evaluation. Table 3 shows the error estimation among three different methods, namely system rating with point-deduction rules, the machine-learning prediction model with three features, and the machine-learning prediction model with five features. For the predicted score p i and the correct score t i given by the teacher, root mean squared error and mean absolute error were measured based on Formulas (2) and (3). Machine learning prediction using five criteria shows the closest evaluation to the expected scores. This shows that the pause time and the answer time are crucial factors while the teacher is rating the students' conversations.
Educ. Sci. 2020, 10, x FOR PEER REVIEW 12 of 19 = ∑ | − | (3) Figure 11. Predicted results of system methods. Right after the chatting experiment, participants were requested to fill out the online survey with 12 statements. The 12 statements were designed based on a five-point Likert scale measuring three aspects: (1) participants' perception of the user interface, (2) participants' perception of the chatting process compared to traditional instruction, and (3) participants' perception of the overall effectiveness of the system. Table 4 shows the results of the survey. The averaged score (AVG) was calculated based on three aspects. Each aspect was evaluated by four sections of the questionnaire, as shown in Appendix B. One point was scored when the strongly disagree option was given. Five points were given when the strongly agree option was given. The result of the questionnaire is shown in Appendix C. The average score of four sections was calculated to represent the perspective of the participants of the corresponding aspect.

Survey Topics
AVG participants' perception of the user interface (Q1-Q4) 3.125 participants' perception of the chatting process compared to traditional instruction (Q5-Q8) 3.545 participants' perception of the overall effectiveness of the system (Q9-Q12) 3.511 Based on the results shown in Table 4, even though participants showed less agreement on the user interface (<3.5), they agreed that using the system to practice English conversation is better Figure 11. Predicted results of system methods. Right after the chatting experiment, participants were requested to fill out the online survey with 12 statements. The 12 statements were designed based on a five-point Likert scale measuring three aspects: (1) participants' perception of the user interface, (2) participants' perception of the chatting process compared to traditional instruction, and (3) participants' perception of the overall effectiveness of the system. Table 4 shows the results of the survey. The averaged score (AVG) was calculated based on three aspects. Each aspect was evaluated by four sections of the questionnaire, as shown in Appendix B. One point was scored when the strongly disagree option was given. Five points were given when the strongly agree option was given. The result of the questionnaire is shown in Appendix C. The average score of four sections was calculated to represent the perspective of the participants of the corresponding aspect.

Survey Topics AVG
participants' perception of the user interface (Q1-Q4) 3.125 participants' perception of the chatting process compared to traditional instruction (Q5-Q8) 3.545 participants' perception of the overall effectiveness of the system (Q9-Q12) 3.511 Based on the results shown in Table 4, even though participants showed less agreement on the user interface (<3.5), they agreed that using the system to practice English conversation is better than traditional conversation practice (>3.5), and the system (including composing dialogue and practicing dialogue) is effective in general (>3.5).
The results for the first section of the questionnaire (The user interface is simple and easy to use) indicate that most participants consider the platform to be clearly designed and easy to use. However, many students were not satisfied (2.72) with the recognition accuracy rate of the speech-to-text software (Q4: The speech-to-text recognition is accurate). Based on an unofficial interview with the instructor, many students became frustrated when the machine replied that their answers cannot be recognized (because of the pronunciation, accent, or not using the pre-designed words or phrases). Once the instructor reminded the students to use only the words or phrases that were taught or focused, all the students successfully completed the three tasks. During the process, however, some students still experienced the issue that their speech could not be recognized smoothly. For example, a couple of students kept saying "Two nights", but the system showed "tonight" in the chat room. Therefore, the speech-to-text function will be modified accordingly in order to increase the accuracy rate. Figure 12 shows the overall results of the online survey. The blue line "UI" represents participants' perception of the user interface. The red line "V.S traditional" represents participants' perception of the chatting process compared to traditional instruction. The green line "Effectiveness" represents participants' perception of the overall effectiveness of the system. The X-axis indicates the corresponding section of the questionnaire. The Y-axis shows the average score for each section.
Educ. Sci. 2020, 10, x FOR PEER REVIEW 13 of 19 than traditional conversation practice (>3.5), and the system (including composing dialogue and practicing dialogue) is effective in general (>3.5).
The results for the first section of the questionnaire (The user interface is simple and easy to use) indicate that most participants consider the platform to be clearly designed and easy to use. However, many students were not satisfied (2.72) with the recognition accuracy rate of the speech-to-text software (Q4: The speech-to-text recognition is accurate). Based on an unofficial interview with the instructor, many students became frustrated when the machine replied that their answers cannot be recognized (because of the pronunciation, accent, or not using the pre-designed words or phrases). Once the instructor reminded the students to use only the words or phrases that were taught or focused, all the students successfully completed the three tasks. During the process, however, some students still experienced the issue that their speech could not be recognized smoothly. For example, a couple of students kept saying "Two nights", but the system showed "tonight" in the chat room. Therefore, the speech-to-text function will be modified accordingly in order to increase the accuracy rate. Figure 12 shows the overall results of the online survey. The blue line "UI" represents participants' perception of the user interface. The red line "V.S traditional" represents participants' perception of the chatting process compared to traditional instruction. The green line "Effectiveness" represents participants' perception of the overall effectiveness of the system. The X-axis indicates the corresponding section of the questionnaire. The Y-axis shows the average score for each section. As shown in Figure 12, the participants responded positively regarding the overall effectiveness of the system and the chatting process compared to traditional instruction. Regarding the user interface, since the students encountered unexpected problems with the speech-to-text recognition software, and they still tended to reply with simple phrases instead of complete sentences, students did not respond with high satisfaction. All in all, however, students still expressed above average satisfaction with the conversation process and the system. They believed that computer-assisted learning environment did improve their learning motivation. Basically, they considered that the overall system design is effective for English language learners to practice speaking, and they will continue to use the system. As shown in Figure 12, the participants responded positively regarding the overall effectiveness of the system and the chatting process compared to traditional instruction. Regarding the user interface, since the students encountered unexpected problems with the speech-to-text recognition software, and they still tended to reply with simple phrases instead of complete sentences, students did not respond with high satisfaction. All in all, however, students still expressed above average satisfaction with the conversation process and the system. They believed that computer-assisted learning environment did improve their learning motivation. Basically, they considered that the overall system design is effective for English language learners to practice speaking, and they will continue to use the system.

Conclusions
This study analyzed a task-oriented "English conversation" learning system. The system simulates professional English teachers to establish a grammar and sentence scoring mechanism. A task-based dialogue framework was proposed, and a preliminary system was developed to test the effectiveness of the proposed framework. The system was used in a college-level English speaking class to test the perceptions toward the system regarding user interface, learning style, and the system effectiveness. This research collected data to evaluate the possibility of replacing the traditional English speaking practice with the proposed system. During the process of performing tasks, the proposed system records the details of the learner's learning data. In addition to the grammar and vocabulary, it also includes the pause time of the dialogue and the number of repeated answers. The proposed task-based dialogue robot simulates the real life conversation. Based on the task-based language learning, students can learn the language by executing the conversational task assigned by the system. This study uses a pre-defined dialogue tree to describe the conversational task and a large quantity of Wikipedia Corpus data to train the natural language capability for the dialogue robot. Based on the collected students' feedback, results confirm the positive perceptions toward the system regarding the learning style and the leaning outcomes. The system provides better semantic understanding and more accurate task-based conversation control.
Compared to the traditional learning method, the system in this study conducts assessment automatically and analyzes learning status. Using the proposed framework, the dialogue is recorded, accessed, and compared to the regular conversation evaluation. The score is given by the auto-scoring module in the dialogue system. Three auto-grading methods were tested in this research. The dialogue system recorded the criteria suggested by teachers and used them to train a model to predict the "correct" score given by the teacher. Coherent grading using these evaluation methods was expected. In addition, the results of the questionnaire show effective learning using the task-based dialogue system. The qualitative feedback from students also provides the evidence of ease of use, usefulness of repetitive practice, and instant response.

Limitations and Future Works
Several limitations were observed in this study. This study only collected 51 test data generated by 28 learners for three topics. The small quantity of data affected the results of the scoring system in the machine learning training prediction model. Furthermore, the experiment was carried out in a computer lab at a university. Due to the frequent use of microphones in the dialogue system, the interference of students' voices is often an issue in a closed classroom space. The hardware equipment could be the crucial factor in a closed space environment. Finally, in this research, a language model was introduced to the dialogue manager module, so that the module could determine the corresponding response sentence by calculating the similarity between sentences. To avoid language ambiguity, increasing the corpus from more sources of the language model could be one of the possible solutions along with a challenge task.
Since the current study focused on beginner-level learners, future research should further examine and confirm these initial findings by exploring the effectiveness of the system being applied for higher-level learners. Furthermore, this system and the research design could also provide a good starting point for discussion and future development investigating task-based conversation of languages other than English. Looking forward, these suggested further attempts could prove quite beneficial to the relevant literature.

Appendix B
Questionnaire Q1. The user interface is simple and easy to use. Q2. The dialogue guidance is helpful for completing the task. Q3. The task flow is smooth and easy to follow. Q4. The voice recognition in the system is accurate. Q5. Compared to the traditional classroom teaching, the content in the system is easier to access. Q6. Compared to the traditional classroom teaching, the language-learning content in the system is easier to learn. Q7. Compared to the traditional classroom teaching, I am better motivated to learn by using this system. Q8. Compared to the traditional classroom teaching, I perform better by using this system. Q9. The scoring board and social interactive functions keep me motivated. Q10. Overall, this system is helpful in English speaking practice. Q11. Overall, I am satisfied with the experience of using this system. Q12. I would like to continue to use this system.