E ﬀ ects of Metacognitive Strategy Training on Chinese Listening Comprehension

: In an exploration of solutions to improve Chinese second language learners’ listening comprehension, this quantitative quasi-experimental study examines the e ﬀ ects of metacognitive strategy training (MST) on learners’ metacognitive awareness, listening performance, and proﬁciency in an intensive language training program. In contrast with the extant research, this study designed a metacognitive learning cycle model, including self-diagnosis, planning, monitoring, evaluation, regulation, and reﬂection strategies, as the content of the MST. Six classes, comprised of a total of 80 participants, were assigned into three groups: Self-directed, teacher-led, and control groups. The Metacognitive Awareness Listening Questionnaire and a listening comprehension test were administered as pre- and post-tests, in addition to a proﬁciency test as a post-test only. Results demonstrate no signiﬁcant di ﬀ erences in metacognitive awareness development, listening performance gains, and proﬁciency test results among the three groups. The results do indicate that the self-directed MST better enhances development of students’ planning and evaluation awareness, and teacher-led MST workshops with special emphasis on the area of monitoring strategy will help students raise awareness. The ﬁndings of this study reveal that insu ﬃ cient training time and MST without the integration of cognitive strategies do not yield signiﬁcant e ﬀ ects. It is suggested that future MSTs should involve su ﬃ cient training time and e ﬀ ective follow-ups to ensure its positive e ﬀ ects. This study proposes that the e ﬀ ectiveness of MST could be improved by combining it with cognitive strategies training. A one-way ANOVA was performed to examine whether the di ﬀ erences among the three groups were statistically signiﬁcant. The ANOVA results indicate that there were no signiﬁcant di ﬀ erences (F = 3.018, p = 0.055) in listening performance gains among the three groups, although the control group showed the highest mean in the performance gain. Additionally, a Levene’s test for the homogeneity of variance was employed, conﬁrming that variances among the three groups were equal (F = 0.700, p = 0.500).


Background
Listening is an essential skill for communication and learning in second language acquisition (SLA). However, listening instruction has been neglected in language curricula and practices for a significant period of time (Field 2008;Luo and Gao 2012;Richards and Rodgers 2001;Rubin 1994;Vandergrift 1999). The reason for this neglect is not that listening is unimportant, but most likely because it is difficult to learn and teach. Some researchers (Graham 2002(Graham , 2006Prince 2013;Vandergrift 2004) have stated that listening is the most difficult skill for learners to develop in SLA, since listening comprehension is a complex, unidirectional, and unobservable process. Graham (2006) posits that "given this complexity and perhaps because the process is largely unobservable, it may be difficult for learners to have a clear understanding of how they go about listening in a foreign language, or, more importantly, how they might improve their performance" (p. 166). Listening is a unidirectional process in which listeners cannot interrupt the speaker for clarification or repetition in some contexts, such as radio broadcasts (Graham et al. 2014). Based on these reasons, "listening is probably the least explicit of the four language skills, making it the most difficult skill to learn" (Vandergrift 2004, p. 4).
Many learners have struggled to improve listening comprehension during their second language acquisition (Chang and Read 2006;Goh 2000;Graham 2006;Prince 2013;Vandergrift 2004). In the past decades, the core issues of second language listening research explored the factors attributed to listening skill development and methods to improve listening comprehension. Studies have shown that successful second language listeners used more metacognitive strategies in their listening tasks than less successful listeners (Goh 2000;Smidt and Hegelheimer 2004;Vandergrift 2002). Research has demonstrated that metacognitive awareness and strategies are crucial to learners' listening skill development. Providing metacognitive strategy training (MST) can help learners become more effective listeners (Goh 1999;Graham and Macaro 2008;Vandergrift 2003).
In order to assess second language listeners' metacognitive awareness and use of strategies, Vandergrift et al. (2006) developed the Metacognitive Awareness Listening Questionnaire (MALQ). The MALQ has been used to examine the effectiveness of MST in listening comprehension instruction in numerous studies (Altuwairesh 2013;Chang and Chang 2014;Coskun 2010;Gagen-Lanning 2015;Movahed 2014;O'Bryan and Hegelheimer 2009;Rasouli et al. 2013). Although these studies showed positive results for the use of MALQ, some doubts related to MST instructional methods and effects remain, because these studies mainly adopted teacher-led methods to provide MST; some even directly used the assessment tool, MALQ, to provide MST content (Coskun 2010;Rasouli et al. 2013). This use of the MALQ triggered more questions about MST, such as whether MST can be provided with a student-directed method and whether, with different training materials instead of using the assessment tool as the training material, MST can achieve the same positive results.
The purpose of this study is to examine the effects of MST when applying different methods-specifically self-directed and teacher-led methods-on Chinese as a second language (CSL) learners' metacognitive awareness, listening performance, and proficiency in an intensive language training program.

Metacognitive Strategy Training (MST)
The concept of metacognition was introduced by Flavell in 1976. Flavell (1976) advocated that metacognition plays an essential role in various cognitive tasks, including language acquisition, comprehension, learning, and self-instruction. He affirmed that it is beneficial and desirable to increase metacognitive knowledge and improve metacognitive skills by providing systematic training for learners (Flavell 1979). Listening comprehension is a complex cognitive process involving attention, perception, memory, information processing, problem-solving, language, and learning; based on Flavell's theory, metacognition may play an important role in listening comprehension.
Recognizing the importance of metacognition, some educators have researched how to incorporate MST into listening comprehension instruction. Goh (2000) proposed four ways to integrate MST into listening instruction: (a) Discussing problems and strategies, (b) encouraging thinking aloud, (c) using listening diaries to reflect, and (d) incorporating metacognitive activities in pre-and post-listening tasks. Vandergrift (2007) stated that the instructional tools capable of raising metacognitive awareness include questionnaires, listening diaries, and discussions, because these tools can activate listening reflection activities for both learners and teachers. Vandergrift (2007) also introduced a pedagogical cycle involving five stages with seven steps for MST in listening instruction.
When providing MST, the first question to be answered is what kind of metacognitive strategies should be taught therein. Flavell (1979) defined metacognition as a cognitive monitoring model and categorized it into four interacted components: Metacognitive knowledge, metacognitive experiences, tasks, and strategies. Metacognitive knowledge determines what strategies and actions a person adopts in a cognitive task, while metacognitive experiences occur to monitor the cognitive course, including planning beforehand, control in the process, and evaluation of the task afterwards. At least three strategies can be drawn from Flavell's description: Planning, controlling, and evaluating. Wenden (1998) classified metacognitive strategies in SLA as planning, self-monitoring, self-evaluation, and self-reinforcement. Anderson (2002) proposed a model of metacognition that includes five components: "(a) Preparing and planning for learning, (b) selecting and using learning strategies, (c) monitoring strategy use, (d) orchestrating various strategies, and (e) evaluating strategy use and learning" (Anderson 2002, p. 2). Compared with Wenden's (1998) framework, Anderson's (2002) model lacks a component allowing learners to regulate after evaluation. Furthermore, there is an overlap between (b) "selecting and using learning strategies" and another two components: (a) Planning and (b) orchestrating. However, Anderson's inclusion of employed orchestration is noteworthy, as this concept has seldom been mentioned in other metacognition models. As Anderson stated: "The ability to coordinate, organize, and make associations among the various strategies available is a major distinction between strong and weak second language learners" (p. 4). Alongside the MALQ, Wenden's framework and Anderson's model have been widely adopted as training content in MST studies.

Effects of MST
MST has different effects on the learning of different subjects for students of different ages, for example, Diebold's (2011) andPrestwich's (2008) studies showed that MST had no significant effect on fourth grade students' reading, but there are a number of studies that have proven the positive effect of MST. With the greater attention to metacognitive strategies in the SLA literature, there have been increasingly more studies examining the effects of MST on SLA learners' reading and listening development over the past decade. Sterling's (2011) reading research did not find positive effects from MST, but other reading studies and almost all listening studies have found positive effects.
Among the SLA listening studies, participant samples vary in size, some as few as three (Gagen-Lanning 2015), and some around twenty (Altuwairesh 2013;Coskun 2010), and some more than forty (Abdelhafez 2006;Movahed 2014;Nosratinia et al. 2015;Chang and Chang 2014). Rasouli et al. (2013) conducted a MST study in a large-scale experiment. They examined the effectiveness of MST on 120 Iranian English as a second language (ESL) students' listening proficiency, and their results indicated that MST had a positive impact on the participants' English test results. Rasouli et al.'s (2013) study used the MALQ as an assessment. Unfortunately, Rasouli et al. did not mention the MALQ results and did not discuss the participants' metacognitive awareness changes before and after MST.
The MST treatments in these listening studies also differ in the length of time, training content, and instrument. In Abdelhafez's (2006) study, the experimental group received a 12-week training course comprised of three sets of metacognitive strategies. Coskun's (2010) study adopted the CALLA model proposed by Chamot and O'Malley (1994) and the MALQ (Vandergrift et al. 2006) as training materials for a 5-week MST treatment. Movahed's (2014) study used Vandergrift's (2007) pedagogical cycle to deliver MST, and the instruments included an anxiety scale, the MALQ, and the Test of English as a Foreign Language (TOEFL). Nosratinia et al. (2015) adopted Anderson's Model to provide MST in over 18 sessions.
It is worth noting that the MST methods utilized in the listening studies are also quite diverse. Altuwairesh (2013) employed a two-phase treatment: The MST with guided listening diaries to encourage self-reflection and deliberate practice. The study recognized the usefulness of both phases, but emphasized the significance of deliberate practice and claimed that MST was a necessary part of deliberate practice. Chang and Chang (2014) integrated MST with an online videotext listening activity to investigate their combined effects. Their study used self-dictation-generation (SDG) activities on YouTube to emphasize the use of metacognitive strategies in listening processes. In addition to researching the learners' achievement between pre-and post-listening tests, a questionnaire-the Strategy Inventory for Language Learning (SILL), developed by Oxford (1990)-and a focus-group interview were also implemented to examine listening strategy use. The study found that the online video-SDG activity facilitated the participants' development of a reflection strategy, which constitutes an alternative instructional method to deliver MST. Similarly, Gagen-Lanning (2015) investigated the impact of MST on students' self-directed use of assisted technology in ESL listening. Gagen-Lanning delivered two 60-min sessions, including metacognitive strategies and TED Talk videos, to three participants, and then encouraged them to use self-directed TED Talk videos for their listening improvement. The study utilized the MALQ, a listening worksheet, screen casting software, and a follow-up survey to collect data. The screencast analysis showed what the participants actually did during the listening task. It is important to check what learners actually do after MST instead of only relying on questionnaire responses, but the sample size (three participants) was too small to draw a reliable conclusion from the findings. The study also found that MST could promote self-directed learning, but whether metacognitive strategies can be learned more effectively with the self-directed method in MST remains unanswered.
Although there are many differences in the MST treatments cited above, they have two points in common. One is that MST improved listening performance and metacognitive strategy use. The other is that the participants of these studies were all ESL learners. Goh (2008) posited that it is necessary to examine MST and the MALQ in various target language contexts.

Chinese Listening Research
Compared with ESL, CSL research is still developing, despite the fact that CSL instruction has a long history. There is little listening research in the CSL field. The reason could be either that listening is difficult to learn and research (Vandergrift 1997) or that listening just simply receives less attention. Oxford (1993) commented that language teachers frequently ignored listening as an essential skill in language learning. Although there have been studies (Chang 2010) investigating learners' reading metacognitive strategies in CSL, the number of listening strategies studies is very low, and the study of listening metacognitive strategies is rare.
The status of Chinese listening research is reflected in the following example from a well-known Chinese academic journal. The Journal of Chinese Language Teachers Association is an academic platform to exchange CSL studies and teaching ideas in the United States. It publishes three issues each year and has been in operation since 1966. Only seven listening studies were found in this journal from 1966 to 2016, whereas there were more studies in speaking (12), reading (36), and writing (20). Among the seven studies, four articles were reviews of listening textbooks or material, two discussed listening pedagogical issues, and only one, Cai's (2013) study, was a listening research paper. Cai (2013) investigated four factors affecting Chinese listening proficiency and whether language heritage had an impact on the factors with 51 CSL learners. Cai's (2013) findings revealed that vocabulary and grammar knowledge are more critical for the development of listening proficiency than sound discrimination skill and metacognitive knowledge, the latter assessed by a revised version of the MALQ. Although Cai's study utilized MALQ, the correlation study is descriptive research and only reflects the relationship between metacognitive knowledge and the listening proficiency of Chinese heritage learners. Whether a cause-effect relationship exists between MST and the listening proficiency of Chinese non-heritage learners still needs to be researched. Further metacognitive research in the CSL field is needed.

Significance of the Study
The absence of studies concentrating on MST in the CSL context highlights the significance of the present study. This study is of importance for several reasons. Firstly, this study may fill gaps in the MST, MALQ, and even metacognition research in CSL listening literature given it examines the effects of MST with a different target language and population from those of existing literature. Chinese listening strategies have not been explored sufficiently in CSL, and further studies are necessary to develop the current knowledge. The results of this study may lead to a deeper understanding of the effects of MST in a CSL context. Secondly, this study presents a new perspective for MST research in the SLA domain. The existing literature on MST mainly focuses on investigating whether MST positively impacts learners' listening and reading comprehensions. This study moves one step further: Establishing a metacognitive learning cycle (MLC) model to provide MST content, as well as an overview of examined effects of different MST methods. Thirdly, as there is controversy among existing studies surrounding the effects of MST, the findings of this empirical study could provide researchers a more comprehensive understanding of MST and present further evidence for educators and instructional leaders to enable decision making when adopting MST in world languages programs.

Research Questions
To achieve the purpose of this study, three research questions (RQ) are addressed: RQ1: Are there significant differences in metacognitive awareness development among two experimental groups receiving MST (self-directed versus teacher-led) and a control group?
RQ2: Are there significant differences in Chinese listening performance gains among two experimental groups receiving MST (self-directed versus teacher-led) and a control group?
RQ3: Are there significant differences in the results of a Chinese listening proficiency test among two experimental groups receiving MST (self-directed versus teacher-led) and a control group?

Research Design
This study employed a quantitative quasi-experimental research method, including single factor within-subject and between-subjects designs. The within-subject design tested whether there were differences in metacognitive awareness and Chinese listening performance between the pre-and post-tests of experimental participants, as measured by the MALQ and classroom listening tests, respectively. In order to ensure the three experimental groups comprised of original classes were comparable, the between-subjects design examined whether there were differences in metacognitive awareness development and Chinese listening performance gains, that is, comparing the changes in the MALQ and the classroom listening pre-and post-tests, respectively, before and after the intervention, or MST. For the examination of final listening proficiency, this study used a statistical method to adjust for possible preexisting differences in the three groups, with the classroom listening pre-test as a covariate. The independent variable was the MST method, including three levels: Self-directed training, teacher-led training, and no training. The dependent variables were participants' metacognitive awareness, Chinese listening performance, and Chinese listening proficiency.

MST Content
In contrast with the extant research, this study designed a metacognitive learning cycle (MLC), including self-diagnosis, planning, monitoring, evaluation, regulation, and reflection strategies, to provide MST. In addition to the four essential metacognitive strategies (planning, monitoring, evaluation, and regulation) highlighted by early researchers, this study incorporated self-diagnosis and reflection into the MST. In the MLC, self-diagnosis is the starting point, and reflection the end point. In practice, the two strategies can be applied simultaneously to form a continuous metacognitive cycle, which means the self-diagnosis is a reflection on previous actions, and the reflection is also a self-diagnosis for the following task. This cycle is displayed in Figure 1.
The training material of the MST included an introduction and the MLC strategies content. The introduction presents what metacognition is, along with why and how to learn metacognitive strategies. The content of the self-diagnosis strategy contains how to self-diagnose listening problems and possible reasons based on a self-assessment with the Listening Self-Diagnosis Assistant (LSDA). The planning strategy consists of two parts: How to make plans and how to effectively manage time. The monitoring strategy states how to focus attention and process meaning during a listening activity. The evaluation strategy focuses on how to assess the quality of listening comprehension and the effectiveness of strategy use. The regulation strategy is about how to reinforce and adjust strategy use after evaluation. The reflection strategy introduces how to analyze and summarize one's own listening performance and problems using answers to the guided questions in the training material. The content of each strategy covers three parts: What, why, and how.

MST Instruction
In this study, two experimental groups received the same content of the MLC as previously described. The designed learning time of the MLC was the same for both self-directed and teacherled groups: 90 min. Both groups' participants were encouraged to apply the MLC strategies into daily listening activities and record their strategy use on provided worksheets. There was no preset time length for application activities. The difference between the two experimental groups lies in the instructional method, including training means, procedures, and activities.
The self-directed group attended a self-study session and the training lasted six weeks. The participants spent 15 min on learning one strategy from a handout prepared by the researcher each Monday, and then followed instructions on an application worksheet to practice and record their strategy use every day in that week. The application worksheet provided step-by-step instructions for the strategy use, and varied each week based on the strategy introduced. The application tasks on the worksheet were integrated into the participants' listening homework in this group. A sample page of the self-directed group's worksheets is in Appendix A.
By contrast, the teacher-led group received two 45-min workshops. In the workshops, the researcher introduced the MLC strategies with PowerPoint slides and explained why, when, and how to apply the strategies-the same content as in the handout for the self-directed group. The teacherled group participants were encouraged to apply the strategies in their daily listening activities whenever and wherever needed after the workshops. They were also asked to record their strategy use on a provided worksheet. However, there were no step-by-step guided instructions for strategy use and no one-by-one focus for each week in the teacher-led group. The worksheet requested the participants to include the date, and to circle the strategy used and the activity applied. A sample of the teacher-led group worksheet is provided in Appendix B.

Participants and Sampling
The population of this study was the students learning Chinese as a second language in an intensive training program at a world languages center on the west coast in the United States. There were 30 classes, including approximately 360 students in the program when this study started. Among these classes, there were 10 classes in each of the three semesters in the year. As this quasiexperiment was conducted in a real teaching setting, some practical factors had to be considered, such as the appropriate timing of training, available assessment tools, integration with the current curriculum, comparability of different group participants, and so on. These factors led to the decision to adopt a cluster sampling method for participant selection. According to Babbie (2007), "Cluster sampling is ideal when it is impossible or impractical to compile a list of the elements composing the population" (Creswell 2008, p. 148).

MST Instruction
In this study, two experimental groups received the same content of the MLC as previously described. The designed learning time of the MLC was the same for both self-directed and teacher-led groups: 90 min. Both groups' participants were encouraged to apply the MLC strategies into daily listening activities and record their strategy use on provided worksheets. There was no preset time length for application activities. The difference between the two experimental groups lies in the instructional method, including training means, procedures, and activities.
The self-directed group attended a self-study session and the training lasted six weeks. The participants spent 15 min on learning one strategy from a handout prepared by the researcher each Monday, and then followed instructions on an application worksheet to practice and record their strategy use every day in that week. The application worksheet provided step-by-step instructions for the strategy use, and varied each week based on the strategy introduced. The application tasks on the worksheet were integrated into the participants' listening homework in this group. A sample page of the self-directed group's worksheets is in Appendix A.
By contrast, the teacher-led group received two 45-min workshops. In the workshops, the researcher introduced the MLC strategies with PowerPoint slides and explained why, when, and how to apply the strategies-the same content as in the handout for the self-directed group. The teacher-led group participants were encouraged to apply the strategies in their daily listening activities whenever and wherever needed after the workshops. They were also asked to record their strategy use on a provided worksheet. However, there were no step-by-step guided instructions for strategy use and no one-by-one focus for each week in the teacher-led group. The worksheet requested the participants to include the date, and to circle the strategy used and the activity applied. A sample of the teacher-led group worksheet is provided in Appendix B.

Participants and Sampling
The population of this study was the students learning Chinese as a second language in an intensive training program at a world languages center on the west coast in the United States. There were 30 classes, including approximately 360 students in the program when this study started. Among these classes, there were 10 classes in each of the three semesters in the year. As this quasi-experiment was conducted in a real teaching setting, some practical factors had to be considered, such as the appropriate timing of training, available assessment tools, integration with the current curriculum, comparability of different group participants, and so on. These factors led to the decision to adopt a cluster sampling method for participant selection. According to Babbie (2007), "Cluster sampling is ideal when it is impossible or impractical to compile a list of the elements composing the population" (Creswell 2008, p. 148).
In the first step of cluster sampling, 10 classes of Semester Three became the targeted participating classes. Among the 10 classes, three would graduate in one month and therefore lacked sufficient time to complete the experiment. The remaining seven classes were all invited to voluntarily participate in the study. Six classes accepted the invitation and became participating classes. The students in the same classes were in the same group of the study, in an attempt to mitigate the threat of intervention diffusion among students. There were no heritage learners in the six classes. The students' level of Chinese listening proficiency is Intermediate High or Advanced Low in Semester Three.
For the second step of cluster sampling, the six participating classes were randomly assigned into three groups: Self-directed, teacher-led, and control groups, whereby each group was comprised of two classes. Eighty out of 86 students voluntarily participated in the study, and 72 completed both pre-and post-tests of the MALQ. Four classes, with 49 participants in total, were the experimental groups. Two classes of them, with 21 participants, received the self-directed MST, and the other two, with 28 participants, had the teacher-led MST. Another two classes, with 31 students in total, formed the control group. The participants' demographic information is provided in Table 1.

Instrumentation
There were three dependent variables in this quantitative study. The first was listening metacognitive awareness, which was measured by the MALQ. The second, Chinese listening performance, was assessed by the Chinese Listening Comprehension Test (CLCT), a classroom test developed by a test team in the program. The third dependent variable was Chinese listening proficiency, represented by the Defense Language Proficiency Test (DLPT) listening score. The reasons for adopting the two listening test instruments are their convenience and standardization. The CLCT is not a standardized test, but it is convenient for checking changes in participants' listening performances before and after MST. As the DLPT is administered only once at the end of the program at this particular language center, it cannot show the participants' listening performance gains before and after MST; but it is a standardized proficiency test that has been validated.

MALQ
The MALQ (Vandergrift et al. 2006) was designed to assess second language listeners' metacognitive awareness. The reliability of the MALQ ranges from 0.68 to 0.78 according to Cronbach's alpha. During its validation process, the MALQ developers used the questionnaire to assess French and English learners' listening metacognitive awareness, and a five-factor model emerged based on a confirmatory factor analysis.
The MALQ has been used for English learners from different countries, such as China, Singapore, Iran, and Saudi Arabia, but it has not been used for CSL non-heritage learners. The researcher received permission to use the MALQ in this study from a leading developer. This study used the original questionnaire without any revisions.

CLCT
The CLCT was developed by an ad hoc test team in the Chinese program at this language center for the purpose of providing a complete test sample to familiarize students with the DLPT format. The CLCT is a traditional paper and pencil instrument and is administered at the beginning and end of semester three in the program. The CLCT includes 40 authentic passages and 60 questions with four answer options. It is graded with raw scores, with a score range from 0 to 60 for the number of questions answered correctly. Each question has only one correct answer. The correct answer keys were provided to each rater, thereby ensuring the inter-rater reliability of different classes.

DLPT
The DLPT is a validated high-stakes proficiency test. It was designed to assess native English speakers' foreign language proficiency as defined by the Interagency Language Roundtable Skill Level Descriptions. The DLPT is used government-and military-wide in the U.S., and is comprised of two separate tests, listening and reading; it is available in different languages. The Chinese DLPT listening test consists of 40 authentic passages with 60 questions related to those passages. The Chinese DLPT is in a multiple-choice format, and each question is followed by four answer choices. The test is delivered on computers at an appointed testing center and graded at the same location. Test scores are in the format of level numbers, including 0+, 1, 1+, 2, 2+, and 3, based on the number of questions answered correctly.

Worksheets
Strategy application worksheets used for the two experimental groups' participants were an additional instrument in the study. Both groups' participants were encouraged to record their strategy applications on the worksheets. Both groups' worksheets were graded based on the same rubric, to quantify the quality and quantity of the experimental participants' actual strategy applications. The scoring rubric included two parts, quantity and quality scores; quantity was the number of uses for a strategy on each day, and quality represented how well the strategy was used each day, including none (0 point), fair (1 point), good (2 points), and excellent (3 points). The worksheet rubric is in Appendix C. The worksheet instrument not only helped the participants apply the strategies learned, but also provided a tool to observe the participants' actual strategy use in MST.
The four instruments formed a continuum of the actual strategy use (worksheets), self-reported strategy use (MALQ), developing listening performance (CLCT), and final listening proficiency (DLPT), which helped to methodically investigate the effects of MST on the experimental participants' listening comprehension development.

Procedure
All participants gave their informed consent for inclusion before they participated in this study which was approved by the ethics committees of the researcher's university and affiliation. Before the intervention, the MST, both experimental and control groups had taken the CLCT as a pre-test in the first week of Semester Three. The teachers graded the CLCT. Participants completed the MALQ the second week of Semester Three to assess their listening metacognitive awareness. The researcher administered the MALQ and collected the responses.
The MST started after the completion of the MALQ the third week of Semester Three. The self-directed group participants self-studied one strategy delivered on a handout for 15 min each Monday, and then followed a worksheet to practice and record the strategy use from Tuesday to Friday; the training lasted six weeks, and the total learning time of six MLC strategies was 90 min. The teacher-led group attended two 45-min workshops to learn the six MLC strategies by participating in the trainer's PowerPoint presentations. The first workshop introduced the first three MLC strategies in the third week and the second workshop finished the other three strategies in the fourth week. The participants were encouraged to use a worksheet to record their strategies use on a daily basis after the first workshop. The researcher was the only trainer for the two experimental groups. The researcher developed and provided all learning materials for the self-directed group and conducted MLC workshops for the teacher-led group.
After the MST that lasted 6 weeks, both experimental and control groups completed the MALQ again and the second set of CLCT in their ninth week of Semester Three, and then took the Chinese DLPT in the last week. The researcher collected all data and conducted the analysis.

RQ1
: Are there significant differences in metacognitive awareness development among the two experimental groups receiving MST (self-directed versus teacher-led) and the control group?
Among 80 participants, 72 took both pre-and post-tests of the MALQ. The data analysis of RQ1 was based on the 72 pairs of MALQ responses. The responses to six items (3,4,8,11,16,18) were reverse coded according to the MALQ Scoring and Interpretation Guide. In order to accurately evaluate participants' metacognitive awareness development and reduce the bias of participants' possible differences before the MST, the difference for each item between the pre-and post-tests was calculated first by subtracting the pre-test value from the post-test value. The differences among the two experimental groups and the control group were compared with a one-way ANOVA. In order to look into each group's metacognitive awareness development, each group's pre-and post-test results were compared with a paired-samples t-test.
RQ2: Are there significant differences in Chinese listening performance gains among the two experimental groups receiving MST (self-directed versus teacher-led) and the control group?
All 80 participants completed the pre-and post-test of CLCT, and the data analysis was based on their scores in the two tests. A one-way ANOVA was used to examine participants' listening performance gains among three groups, that is, the difference between their pre-and post-test scores. A paired-samples t-test was performed, followed by a Pearson correlation analysis, to investigate each group's gains and the relationship with metacognitive awareness development.
RQ3: Are there significant differences in the results of a Chinese listening proficiency test among the two experimental groups receiving MST (self-directed versus teacher-led) and the control group?
All participants completed the Chinese DLPT at the end of the experiment. Their DLPT listening results varied from level 0+ to level 3, specifically 0+, 1, 1+, 2, 2+, and 3. The levels from 0+ to 3 were transformed into numbers from 0 to 5. As participants took the CLCT as a pre-test before the MST, this study made the pre-test scores a covariate when comparing the three groups' DLPT results.
The RQ3 was answered with an analysis of covariance (ANCOVA), a general linear model that blends ANOVA and regression, followed by a Pearson correlation analysis. The ANCOVA was used to reduce the bias from participants' possible different performances before the MST and accurately evaluate the effectiveness of MST. The correlation analysis examined relationships between the DLPT results and the five factors identified in the MALQ.

Metacognitive Awareness Development
After the MST, the two experimental groups demonstrated more awareness than the control group in the metacognitive factors. Table 2 presents the results of the MALQ pre-and post-tests. The overall means of the three participant groups increased from the pre-test to the post-test, with the self-directed group showing the greatest increase (4.48), followed by the teacher-led (3.19) and the control (2.21) groups. There are 21 items constituting five distinct factors/subscales in the MALQ: Problem-solving (PS), planning and evaluation (PE), mental translation (MT), directed attention (DA), and person knowledge (PK). For each of the five factors, the difference between pre-and post-test scores was calculated; this difference represents "metacognitive awareness development." The descriptive statistics of the metacognitive awareness development for the five factors are shown in Table 3. The self-directed group's means of PE, DA, and PK factors were higher than those of the teacher-led and control groups, particularly on PE and PK factors, but lower on PS and MT factors, where the control group showed the greatest development.  A one-way ANOVA was conducted to compare the three groups' metacognitive awareness development within the five factors. The results indicate that there were no statistically significant differences in metacognitive awareness development among the two experimental groups (self-directed and teacher-led) and the control group.
In order to examine metacognitive awareness development within groups, each group's preand post-scores for the five factors were compared using a paired-samples t-test. The t-test result presented in Table 4 shows that there were significant differences in PE factor development within the self-directed (p = 0.015), teacher-led (p = 0.007), and control (p = 0.008) groups. The different means indicate that the self-directed (M = 3.524) group showed the greatest difference between preand post-tests, followed by the teacher-led (M = 2.407) and control (M = 1.833) groups. Additionally, there was a significant difference in DA factor development within the teacher-led group (p = 0.05), although the self-directed group showed the greatest mean difference (M = 1.000). This indicates that all three groups showed a significant increase in metacognitive awareness development on the PE factor, regardless of MST intervention, but only the teacher-led group had a significant increase in the DA factor after MST. Table 4 confirmed the results from Tables 2 and 3. That is, except for PS and MT, two cognitive-focused skills, the other three metacognitive factors clearly demonstrated higher means for the experimental groups than for the control group. The participants in the two experimental groups were asked to apply and record their metacognitive strategy use, and, although the participants did not record many uses, their application data was collected and graded according to a scoring rubric. The application scores were analyzed with a Pearson correlation to investigate relationships between strategy application and metacognitive awareness development on the five factors. There were no significant correlations between participants' strategy applications and metacognitive awareness development on the five factors (p > 0.05). Table 5 shows descriptive statistics for participants' pre-and post-test results in the CLCT. Table 6 presents three groups' listening performance gains, which are the differences between their post-and pre-test scores. The results in Table 6 show that the control group had the highest gain (M = 9.19; SD = 4.64), followed by the teacher-led (M = 6.82; SD = 4.46) and self-directed (M = 6.00; SD = 5.99) groups. A one-way ANOVA was performed to examine whether the differences among the three groups were statistically significant. The ANOVA results indicate that there were no significant differences (F = 3.018, p = 0.055) in listening performance gains among the three groups, although the control group showed the highest mean in the performance gain. Additionally, a Levene's test for the homogeneity of variance was employed, confirming that variances among the three groups were equal (F = 0.700, p = 0.500).

Listening Performance Gains
Before running the paired-samples t-test, the database was split by the MST variable. The paired-samples t-test results in Table 7 show that there were significant differences between the pre-and post-test scores in the CLCT for all three groups, including the self-directed (p = 0.000), teacher-led (p = 0.000), and control (p = 0.000) groups. A Pearson correlation analysis was adopted to examine whether the significant listening performance gains in the three groups were related to participants' metacognitive awareness development. The results show that there were no significant correlations between listening performance gains and metacognitive awareness development on the five factors (p > 0.05). This means there were likely some other factors that resulted in participants' Chinese listening performance gains across the three groups.

DLPT Listening Results
The descriptive statistics of the three groups' DLPT listening results show that the teacher-led group's mean (M = 4.07; SD = 0.900) was slightly higher than the control (M = 3.87; SD = 0.846) and self-directed (M = 3.71; SD = 1.102) groups' means. This is similar to the groups' CLCT pre-test sequence, where the teacher-led group (M = 41.61) was slightly higher than the self-directed (M = 40.19) and control (M = 39.19) groups. The Levene's test for equality of variances demonstrates that error variances of the dependent variable were equal across groups (F (77) = 2.055, p > 0.05).
An ANCOVA was performed to answer the RQ3, with reduction in the bias from participants' possible different performances before the MST. In the ANCOVA analysis, the DLPT listening results were set as the dependent variable, the MST group as the independent variable, and the CLCT pre-test score as a covariate to compare the three groups' DLPT listening proficiency. The ANCOVA results presented in Table 8 shows that there was no overall statistically significant difference in Chinese DLPT listening results among the three groups after their means were adjusted using the CLCT pre-test scores [p-value (F = 0.733, p = 0.484)]. After the ANCOVA analysis, a Pearson correlation analysis was performed in order to examine relationships between Chinese DLPT listening scores and the five metacognitive factors measured in the post-test of the MALQ. The results presented in Table 9 show that the DLPT listening results were significantly but negatively correlated with the PE factor (r = −0.235, p = 0.041) and the MT factor (r = −0.238, p = 0.039), and not significantly correlated with the other three factors. The negative correlations means that high DLPT scores were associated with low scores in the PE and MT factors.

Results Summary
Overall results showed that there were no statistically significant differences in metacognitive awareness development, listening performance gains, and DLPT listening proficiency among the three groups. However, there were significant differences in PE factor development within the three groups, with the self-directed group showing the highest difference between pre-and post-tests, followed by the teacher-led and control groups. There was also a significant difference in DA factor development within the teacher-led group. Furthermore, the two experimental groups demonstrated more awareness than the control group in most metacognitive factors except for two cognitive-focused factors, PS and MT, in the MALQ descriptive data. The self-directed group showed higher scores on DA, and particularly on PE and PK, than the other two groups. The results indicate that the self-directed MST can bring the highest significant difference on the PE factor and the teacher-led MST will significantly improve DA awareness. Both MST training methods promoted the awareness of metacognitive factors except for the cognitive-focused factors. The different MST methods did not show significant differences in participants' Chinese listening performance gains nor in proficiency test results.

RQ1 Findings
It was anticipated that there would be significant differences in metacognitive awareness development among the two experimental groups receiving MST (self-directed and teacher-led) and the control group. The differences were not supported by the statistics found in this study. In other words, there was no apparent development of metacognitive awareness after participating in MST. However, the t-test results indicate that the self-directed MST can bring the highest significant difference on the PE factor and the teacher-led MST can significantly improve DA awareness.
One possible reason for the non-significant overall results could be the differences between the instrument (MALQ) and the training content of MST (MLC). The MALQ is a questionnaire using self-report to assess metacognitive awareness represented by five factors: Problem-solving, planning and evaluation, mental translation, directed attention, and person knowledge. This study did not directly teach the five factors, but used the MLC model, consisting of self-diagnosis, planning, monitoring, evaluation, regulation, and reflection strategies, to provide MST. Although there were some overlaps (planning and evaluation) and similarities (DA factor and monitoring strategy, PK factor and self-diagnosis strategy) between the MALQ and the MLC, there were differences on other factors, such as problem-solving and mental translation in the MALQ, and regulation and reflection in the MLC. Some previous studies used the MALQ to provide MST content (Coskun 2010;Rasouli et al. 2013), which may then lead to a significant positive effect when answering the questionnaire. The MALQ is likely not the best instrument to assess what was taught in an experiment with different MST materials, such as the MLC model in this study.
The second possible explanation for the non-significant results could be attributed to insufficient training time and the teaching quality of the MST in this study. Although 90 min was sufficient for introducing metacognitive strategies to the experiment participants, it may have been insufficient for a positive impact on the change of their metacognitive awareness. The participants might need more time to apply and internalize the strategies learnt, and the positive effects of MST might be reflected after more practice. In previous studies, treatment time was 45 to 50 min per week for five or more weeks (Birjandi and Rahimi 2012;Coskun 2010;Rasouli et al. 2013). For example, Birjandi and Rahimi's (2012) MST lasted six weeks and took 45 min once a week. Aside from the time insufficiency, the intensity of training might not have been enough; the researcher had concerns that the experiment participants would opt to withdraw from the MST if the training was too intense, because they already had a heavy learning workload and other duties on a daily basis. In addition, because participation was completely voluntary, the training lacked accountability, which means there were no follow-up actions if the participants did not practice, record, and submit their strategies application worksheets. The lack of accountability may have influenced the quality of the MST. As a result, the MST appears not to have had an overall impact on the experiment participants' listening metacognitive awareness development.
The t-test results showed that there were significant differences in planning and evaluation factor development for all three groups, including the control group. Planning before listening and evaluating after listening are two strategies commonly used in listening activities and may be taught in listening instruction or improved by learners themselves as they use metacognitive strategies. If participants did not spontaneously improve their planning and evaluation strategies as they developed their listening comprehension skill, then the two strategies were very likely taught to the control group during their routine listening instruction, which may have resulted in the significant gains in this group. The experimental groups' significant improvement could be caused by the MST interventions as well as by routine listening instruction. The two experimental groups, self-directed and teacher-led, both showed higher means than the control group, and the self-directed group showed the highest development on the planning and evaluation factor. Additionally, the planning and evaluation strategies were directly taught in the MLC model, and were measured by the MALQ as a merged factor. This may illustrate that the matching of training content and assessment tools could lead to significant differences between the participants' pre-and post-tests in the MALQ.
Furthermore, the t-test result showed that there was a significant difference on the development of the direct attention factor in the teacher-led group only. A possible explanation for this might be that the importance of the monitoring strategy was particularly emphasized in the teacher-led MST workshops. The monitoring strategy in the MLC is focused on two parts of the listening process: Attention focusing and meaning processing. The first, attention focusing, is very close to the DA factor assessed in the MALQ. When introducing the monitoring strategy, the researcher highlighted that this is the most important but most difficult metacognitive strategy for the listening skill. The special emphasis on the importance of this strategy might have attracted the teacher-led participants' attention to it. This also further speaks to the positive effects on MST that the matching of MST content and assessment tool could bring. On the other hand, compared with the PE factor, the difference on the DA factor between the teacher-led group and the other two groups may indicate that students cannot develop the monitoring strategy spontaneously or through self-directed learning, but teacher-led workshops with special emphasis on the strategy will help students raise awareness in this area. The significant differences on PE and DA reflected in the t-test show that the interventions with the two MST methods, self-directed and teacher-led, have a certain effect on the metacognitive awareness development, but not on cognitive skills, such as the problem-solving factor in the MALQ, because it was not included in the MLC training content. Literature reviews indicate that promoting metacognitive awareness can improve listening performance and proficiency. However, since there were no significant differences in metacognitive awareness among the three participant groups, it may very well be that the three groups were actually not different with respect to the other dependent variables in this study, and, therefore, we would not expect any significant effects in RQ2 and RQ3.

RQ2 Findings
The ANOVA results of RQ2 indicated that there were no significant differences in listening performance gains among the three participant groups, although there were significant differences between pre-and post-tests across the three groups. A Pearson correlation analysis showed that there was no significant correlation between participants' listening performance gains and their metacognitive awareness development. This finding is consistent with those previous studies that did not discover significant effects of MST (Diebold 2011;Leary 1999;Prestwich 2008;Sterling 2011), but differs from the research showing the significant effects of MST (Coskun 2010;Fan 2009;Movahed 2014;Wang 2009). Fan's (2009) andWang's (2009) studies, for example, revealed significant effects when using MST combined with cognitive strategy training. Fan (2009) explored the impact of MST on ESL learners' reading comprehension with 143 first-year university students. Her study showed that the participants receiving MST performed better on a reading comprehension test than the students without MST. The three metacognitive strategies taught in her MST were think-aloud, text structure, and summarization. Rigorously speaking, text structure and summarization are not metacognitive strategies, but reading cognitive strategies. Similarly, Wang (2009) employed a sequential mixed-method research design to investigate the effects of MST on ESL high school students' reading comprehension, strategies awareness, and motivation. Wang's (2009) dissertation presented a strong research design, but the core content of her experimental treatment were basic reading strategies rather than the metacognitive strategies she originally proposed and presented in her title. Fan's (2009) and Wang's (2009) studies reminded the researcher that MST with a stress on cognitive strategies could result in more positive significant effects on participants' cognitive skills. However, the MST intervention in this present study focused on metacognitive strategies without integrating any cognitive strategies. This could reveal that a MST focused exclusively on metacognitive strategies without integrating cognitive strategies may have a limited impact on listening performance, because listening comprehension is a cognitive task, and therefore requires cognitive strategy application.
The provision of explicit instruction combining metacognitive and cognitive strategies for MST has been advocated by a number of scholars (Schraw 1998;Veenman et al. 2006;Schneider 2008). Leopold and Leutner (2015) conducted three experiments to compare the effects of cognitive-only strategy training and cognitive and metacognitive combined strategy training on students' learning using scientific texts. Their results showed that the combined training helped students improve performance; the cognitive-only strategy training was not effective. This combined MST effect was proven across their three experiments. Based on the previous studies and the findings of RQ2, this present study makes a new proposal for future research, that effectiveness in MST may be improved by combining it with cognitive strategy training. The cognitive strategies could include pre-listening prediction, skipping unknown parts while listening, and post-listening information compensation strategies. These cognitive strategies can be combined with corresponding metacognitive strategies, for example, prediction can be introduced into the planning strategy, the skipping strategy integrated with the monitoring strategy, and the compensation strategy included in the evaluation strategy.
Regarding the results of non-significant differences between the self-directed and teacher-led groups, this present study is different from Ball (1998) and Manning (2003), but echoes Leary's (1999), Prestwich's (2008), and Sterling's (2011). Sterling (2011) examined the effects of teacher-led and peer-led MST on community college students' achievement in English, finding that there was no statistical difference between the two methods.

RQ3 Findings
The RQ3 results showed that there was no significant difference in DLPT listening proficiency among the three participant groups. This indicates that the different MST methods did not produce significant differences in listening proficiency in this study.
As with the RQ1 and RQ2 findings, the results for RQ3 might be attributed to insufficient training time and the ineffectiveness of the MST. The RQ1 results showed that there were no significant differences in metacognitive awareness development among the three participant groups. This means the three groups were not different when it comes to metacognitive awareness, and, therefore, one might expect no significant differences in the related dependent variables, including listening proficiency, among the three groups. As discussed in the RQ2 findings, the MST curriculum focused only on metacognitive strategy training and did not provide cognitive strategy training. The MST alone, without connecting corresponding cognitive strategy in training, could have a very limited impact on listening proficiency.
The DLPT, a measure of language proficiency, is not closely aligned with the MST in this study-the training of generic metacognitive strategies. Language proficiency is a comprehensive result of language learning and is influenced by a combination of factors. This quasi-experiment was conducted in the existing classes that had different students, teachers, teaching materials, and daily teaching schedules. These factors might be relevant to the participants' listening proficiency. The participants in the three groups might have different motivations for improving their listening proficiency and adopting varied listening practices. The teachers in the three groups may have different knowledge and teaching skills about listening instruction and metacognition. The participating classes had different teacher to student ratios and utilized different teaching materials and schedules during the experiment. These uncontrollable factors may have potentially influenced participants' listening performance gains and proficiency results. The MST was not necessarily the only factor that impacted the listening performance and proficiency results.
Interestingly, the correlation analysis of RQ3 showed that participants' DLPT listening proficiency negatively correlated with the PE and MT factors. This means that the participants who use the PE less or the MT more could obtain higher results in the DLPT listening test. Within the MALQ, the MT factor represents the translation strategies that skillful listeners must avoid (Vandergrift et al. 2006). In this study, all three items (4, 11, 18) representing the MT factor were reverse coded before data analysis; therefore, the negative correlation indicates that the more MT is used, the higher the listening proficiency, which is consistent with Goh and Hu (2014) findings. In their multiple regression analysis with 113 participants, they also found that there was a significant negative correlation between the MT factor and listening proficiency test results. Goh and Hu explained that this correlation might relate to their participants' lack of vocabulary and inability to recognize the sounds of words while listening. The repeated negative correlation results in this present study may suggest that the relationship between translation strategy and listening proficiency is not as simple as commonly thought when speaking of metacognition, and needs further research. This suggestion can be supported by Liu's (2008) study results, where the translation strategy showed variations between the most and least efficient listeners among advanced, intermediate, and elementary levels.

Conclusions
In this study, three research questions were not upheld by statistical results. The results indicate that MST does not necessarily show statistically significant effects on metacognitive awareness development, listening performance gains, and proficiency test results. The non-significant results may be attributed to the instruments, MST duration and intensity, and some uncontrollable factors related to the participating classes in a routine language course.
Despite the fact that the results were not as expected, this study answered some unresolved questions and may reveal important principles for future MST design. This study demonstrates that the MST using training materials different from the assessment tool, such as the MLC model in this study, may not necessarily achieve the same significant effects as those with the MALQ as training materials. The results do indicate that the self-directed MST enhances development of students' planning and evaluation awareness more than the teacher-led MST and non-training. It was also found that students could not develop their monitoring strategy spontaneously or through self-directed learning, but teacher-led MST workshops with special emphasis on the importance of monitoring strategy will help students evolve in this area. With respect to previous MST studies, the findings of this study reveal that insufficient training time and the MST without the integration of cognitive strategies do not yield significant effects. Therefore, it is suggested that future MSTs should last for a longer period, with sufficient training time and effective follow-ups to ensure its positive effects. In addition, this study proposes a hypothesis for further research: MST effectiveness could be improved by combining it with cognitive strategy training.
Funding: This research received no external funding.

Acknowledgments:
The author would like to sincerely thank the guest editor and anonymous reviewers for their constructive feedbacks and valuable inputs. All errors remain her own.

Conflicts of Interest:
The author declares no conflict of interest.

Disclaimer:
The content of this article is the sole responsibility of the author and is not necessarily the official views of, or endorsed by the author's affiliation.
Appendix A Table A1. A sample of application worksheets for the self-directed group.
Week 1: Self-diagnosis Learning Objectives: 1. Identify your listening problems. 2. Find possible reasons of the problems. Tasks: Follow instructions of each day, complete assigned activities with 15 min daily.

Instructions
Problems Identified Possible Reasons Notes Day 1: Review the LSDA. Identify and summarize your problems and possible reasons with the LSDA. Fill out the right columns.
Day 2: Listen to a text in homework. After listening, fill out the right columns. Pay attention to linguistic problems.
Day 3: Listen to a text in homework. After listening, fill out the right columns. Pay attention to your listening strategies.
Day 4: Listen to a text in homework. After listening, fill out the right columns. Pay attention to your management strategies, anxieties, and nervousness.
Day 5: Listen to a text in homework. Check the problems identified and summarize patterns.

Appendix C
This rubric was used for quantifying experimental participants' strategy applications recorded on their worksheets in the MST. The strategy applications were scored based on the quantity and quality of strategies used on each day. Quantity is the number of the strategies used on each day. Quality represents how well the strategy is used. Since each day is represented by one row on the recording worksheets, the following rubric is the scoring standard for each row.