Chinese University Students’ Experience of WeChat-Based English-Language Vocabulary Learning

The outbreak of COVID-19 worldwide in 2020 has posed tremendous challenges to higher education globally. Teaching English as a foreign language (TEFL) is among the many areas affected by the pandemic. The unexpected transition to online teaching has increased challenges for improving and/or retaining students’ language proficiency. WeChat, a popular social application in China, was widely used for TEFL at Chinese universities before COVID-19. However, it remains unclear whether the use of WeChat can facilitate Chinese university students’ English-language lexical proficiency during the pandemic. To fill this gap, the aim of the present study was two-fold: (1) it initially explored the relationship between the variables including students’ academic years, genders, and academic faculties/disciplines, and their lexical proficiency; and (2) it evaluated the effectiveness of a WeChat-assisted lexical learning (WALL) program in facilitating learning outcomes of Englishlanguage vocabulary. One hundred and thirty-three students at a university in Northern China participated in the WALL program for three weeks. As the results indicated, the independent variables had no correlation with the students’ lexical proficiency. More importantly, the students had a decline in the test scores after using the program, compared to their initial test scores. Moreover, the difference was reported to be medium. The findings further proposed questions on applying WeChat to vocabulary teaching in a large-scaled transition. The study is expected to provide insights for tertiary institutions, language practitioners, and student stakeholders to troubleshoot the potential problems regarding implementing WeChat-based TEFL pedagogies.


Introduction
The COVID-19 pandemic has caused massive disruptions to educational institutions worldwide. With the advent of COVID-19 control measures, China mandated the nationwide school closures at the end of February 2020 [1]. An emergency policy launched by the Chinese government-"Suspending Classes without Stopping Learning"-shifted higher education online [2]. This initiative minimized the impact of the pandemic on education [3] and ensured that learning was not disrupted at any point during the lockdown [4]. The online or web-based teaching, served as a substitute for the traditional way of classroom teaching, had been the mainstream mode of delivering teaching since late February 2020 [5]. Due to China's successful control measures, by June 2020, a large number of university students were able to return to campuses, resuming their studies in a blended mode [6].
Mobile learning (m-learning) refers to the phenomenon that mobile devices are applied for learning purposes [7]. As an integration of m-learning and language learning, mobile-assisted language learning (MALL) assists or enhances language learning in formal and/or informal environments, using handheld mobile devices, such as mobile phones or smartphones [8]. In spite of the widely recognized advantages of using MALL, there were reports on unequal learning performances of some university students during the transition [9]. This calls for an investigation on MALL in the current global crisis context [10]. To address this gap, the present study aims to explore the factors influencing the lexical learning outcomes in the transition. It also evaluates the effectiveness of a WeChat-based MALL program used for English-language vocabulary learning at a Chinese university during the pandemic.
Numerous challenges during the pandemic, including technology accessibility, Internet connectivity, socioeconomic status, institutional supports, and learner experience, have left MALL effects uncertain [11]. It also remains questionable about the adoption of MALL-based pedagogies in higher educational systems in developing countries, such as China [12]. This particular study thus initially unveils the potential factors that might have impacts on students' English-language vocabulary proficiency. It also unpacks the effectiveness of equipping Chinese university students with a WeChat-based MALL program for English-language vocabulary learning under current circumstances.
To address the research objectives mentioned above, the paper initially unpacks related work primarily on MALL effects and WeChat-based language learning/teaching practices at Chinese tertiary levels, especially during the pandemic. It then presents that a WeChatbased MALL program which was used as an intervention. Next, the paper clarifies that the quantitative data were collected by using two sets of English-language vocabulary tests. Additionally, the test results were analyzed by using the Mann-Whitney U tests and Kruskal-Wallis tests to determine the relationships between the students' tests scores and several independent variables, such as their academic years, genders, and academic faculties/disciplines. The paired-sample t-test was then used to find out the statistically different significance between the test scores. The effect size was applied to further measure the difference, if there truly was a significance reported. Finally, the paper sheds light on a number of potential reasons that attributed to the findings.

Related Work
Studies of MALL did not receive as much attention as its wide prevalence in academia during the period of COVID-19 [13]. Nevertheless, m-learning has been a preferred learning manner for online language teaching practices during the pandemic [14]. MALL studies at the higher educational level focused primarily on participants' perceptions. For example, [15] surveyed 100 students and 45 teachers' reactions to the transition to mobile Russian language learning. Likewise, Chinese students held supportive attitudes that MALL approaches have made learning easier in their university English course particularly during the lockdown [10]. However, these studies mainly explored students and educators' beliefs and opinions on transforming to MALL paradigms rather than investigating its practical effects. Other studies, such as [16] and [17], analyzed affordances of implementing QQ and WeChat as MALL tools for university English courses facing the transition. The authors, however, largely put forward pedagogical implications of designing and applying MALL-based TEFL models, instead of measuring the actual effectiveness. Another study evidenced that voluntary out-of-class MALL has enhanced outcomes of learning French language by self-regulated training and scaffolding [18]. However, the study did not clarify whether the experiments were conducted in the transition and did not calculate the effect size of the treatment.
Shown as above, evaluating actual MALL effects on TEFL considering the pandemic situation was somehow overlooked, especially in the Chinese university context. For example, one most recent study reported mobile multimedia tools, including laptops or tablets, had positive impacts on Chinese university students' English learning outcomes under current situations [19]. However, the findings did not specify the facilitated language teaching area(s). As the most researched language-teaching area in MALL studies [20], mediated vocabulary learning outcomes have been investigated by using quantitative, qualitative, and/or mixed-methods approaches in a wealth of studies in the literature before the COVID-19 period [21]. Nevertheless, evaluating mobile-assisted vocabulary learning effects in Chinese universities during the transition received seemingly little emphasis. Moreover, little empirical evidence scrutinized the effectiveness of specific media/platforms on MALL, such as WeChat.
WeChat has gained a high popularity in Chinese universities during the pandemic [22]. However, little research has explored using WeChat for university MALL practices amid the pandemic. One existing evidence is WeChat has developed students' enjoyment through emotion regulation in online collaborative English writing learning activities [23]. However, MALL on lexical proficiency has been sparse in the transition. Ref. [24] claimed, factors including user groups, academic contexts, and applications in use, largely attributed to successful MALL practices, particularly in vocabulary learning/teaching. Therefore, when evaluating the MALL effects, it is necessary to consider the factors mentioned above, because they potentially led to students' uneven MALL outcomes during the pandemic [25].

Research Aim and Questions
As previously discussed, the effectiveness of applying WeChat to vocabulary learning in the chosen context remains under-researched. To bridge this gap, 133 students at a Northern Chinese university participated in a WeChat-assisted lexical learning (WALL) program for a period of three weeks. The study addresses the following research questions (RQs): RQ One: Do students' academic years, genders, and academic faculties/disciplines have impacts on their English-language vocabulary learning outcomes?
RQ Two: How well does the WALL program assist students to achieve their Englishlanguage vocabulary learning outcomes?

Research Design
The reported study is under a larger scope PhD project. The research method underpinning the PhD project is a mixed-methods research design, including open-ended questionnaires, tests, and semi-structured interviews. This reported study was designed as the pilot study. It primarily focused on the quantitative data collected through two sets of vocabulary test papers, These tests included a Diagnostic Test administered before the WALL program and a Follow-Up Test administered afterwards. Differences between students' test scores collected before and after using the program were examined. The effect size was also calculated to determine whether the difference was large, medium, or small. The program duration was about three weeks, from 24 May to 21 June. Such a length of duration was decided, due to the fact that intervention or treatment shorter than one month was conducive to learners' improvement [26]. For longer programs, fatigue might appear [27], due to the novelty effect [28].

Participants
A purposive sampling strategy was applied to select the subject academic institution, sample faculties/disciplines, and sample participants in this study. Such a method was efficient in terms of time and effort saving, and suitable for a case study, as well as wellserving the research objectives [29]. Firstly, the university where the study was conducted is a key and comprehensive academic university in Northern China. University students were purposively chosen as the sample, because they are physically and mentally mature enough and are able to manage and discipline their learning tasks and pace [30]. Learning in universities is more independent than at other educational levels [31]. Besides, since this study was conducted via the WeChat public account on mobile phones, China has a large population of mobile phone users at universities [32]. Secondly, the selection of faculties/disciplines was also purposive. Among the total 24 faculties/disciplines, four faculties/disciplines were purposively selected as the representation, including the School of Architecture, School of Chemistry, School of Information Technology, and School of Media and Communication. Thirdly, the Year 1 and Year 2 students from the four faculties/disciplines were purposively chosen, because non-English majors at most Chinese universities usually finish studying English courses at the end of Year 2. Initially, 150 students were recruited in a nonrandomized way [33]. Regarding the recruitment process, an invitation letter was posted on the websites of the faculties/disciplines after receiving the approval. Students volunteered for this study were contacted by the researchers via emails. Potential participants in naturally-formed classes were chosen as a convenience sample without randomization [34]. They were then informed of more details regarding this study before they made the final decision on their voluntary participation.

Demographic Backgrounds of the Participants
Totally, 150 students took the two sets of vocabulary tests online. One hundred and thirty-three submitted and completed both tests. Details of the students' demographic background information are presented in Table 1.

Research Instruments
The WeChat-based MALL WALL program The WALL program was designed and developed by the authors for university English vocabulary learning, including learning content and materials, daily practice and drills, and additional learning resources. The delivered content consisted of texts, audios, and video clips, covering a wide range of knowledge regarding a particular location, such as natural landscape, wildlife, social life, lifestyles, architectures, and cultures. The program was delivered through the public account service of WeChat. Students received daily notifications from the public account via WeChat the APP on mobile phones.
The English-language vocabulary test papers Two sets of English-language vocabulary tests were used as the measurement tool. They were administered and collected online. In Phase 1, the Diagnostic Test, circulated before the program, was used to identify the students' initial lexical knowledge. The test paper had 25 multiple-choice questions. Each question contained one target lexical item and three distractor lexical items as interferences. Eighty lexical items were words and 20 were phrases. The lexical items were all extracted from the latest formal test papers of the College English Test: Band-4 (CET-4) (Retrieved from http://cet.neea.edu.cn/html1 /folder/1608/1178-1.htm accessed on 19 May 2020) and selected based on the English teaching syllabus at the university. For example, "The college students in China are ____from smoking on campus because this will do them no good. A. discouraged B. observed C. obeyed D. obtained". Another example was "After talking for nearly ten hours, he ____to the government's pressure at last. A. expressed B. yielded C. decreased D. approved". In Phase 2, the Follow-Up Test, administered after the program, was used to examine the students' mediated lexical knowledge. The test paper was designed at the same way as the former one. The lexical items in this paper were all based on the learning materials and content delivered in the program. One example was as follows: "The wombat is a large ____found only in Australia. A. carnivore B. arthropod C. marsupial D. reptile". Another example was as follows: "The town of Binalong Bay is ____the southern end of the beautiful Bay of Fires. A. next to B. situated at C. near D. far from". All lexical items were carefully selected according to the lexicon requirements in the College English Curriculum Requirements [35], compulsory active lexical items [36], and word frequencies [37]. Moreover, three experienced English-language teachers from the university were consulted to check all the target lexical items online before the program commenced. The practical relevance and difficulty levels of the lexical items were also verified.
Vocabulary tests were used as the test instrument to measure students' language proficiency and served as a vocabulary learning outcome indicator in this study for the following reasons. Firstly, vocabulary is the foundation of any language, and vocabulary education is a vital link in the chain of language acquisition [38]. Lexical ability is regarded more useful in English learning and teaching objectives [39] and plays a critical role in students' future careers in China [40]. Secondly, the glossary in the College English Curriculum Requirements is taken as a testing standard for lexical knowledge and as the criterion and norm reference for university English-language lexicons [41]. Thirdly, tests, as a comparatively easy and labor-saving way, can measure learners' vocabulary skills efficiently and accurately [42].

Data Analysis
The collected data were analyzed by using the Statistical Packages for Social Science (SPSS) version 26.0. Mean values and standard deviation of the students' test scores were firstly calculated for descriptive statistical analyses. Next, Mann-Whitney U tests were conducted to determine whether there were statistically significant differences between the students' test scores and independent variables, including academic years and genders. Then, Kruskal-Wallis tests were applied to examine whether significant differences existed between their test scores and academic faculties/disciplines. Finally, a paired-sample t-test was adopted to explore statistically significant differences between the two sets of test scores, namely the Diagnostic Test and the Follow-Up Test. If the significance value of the paired-sample t-test is smaller than 0.05, it shows that a significant difference is found between the students' scores on the two different tests. This indicated the WALL program has had an impact on the students' lexical proficiency. Otherwise, it can be said that no significance existed between the two sets of the test scores. Hence, the WALL program had no impact on the students' lexical proficiency. Subsequently, the effect size was calculated to measure whether the difference was large (d = 0.80), medium (d = 0.50), or small (d = 0.20), using Cohen's d [43], if there is any significance.

Descriptive Analysis of the Test Results
The descriptive results of the two tests are first presented in this section. Table 2 shows the results of the Diagnostic Test and the Follow-Up Test, respectively. It can be seen that the mean value of the Diagnostic Test was 44.32, while the mean value of the Follow-Up Test was 31.10. The mean values above showed that overall, the students failed both tests, since 60 points (out of 100) is generally regarded as a passing score in China. A lower extent of statistical dispersions was also found on the individual participants' test scores on the Follow-Up English Test (standard deviation = 14.147) than the ones on the Diagnostic Test (standard deviation = 22.835). It can be claimed that despite a lower variation existed among the students' Follow-Up Test scores, their test performance was less satisfying as a whole. Additionally, according to the frequencies of the test scores from Table 3, 28.6% (N = 38) of the 133 students successfully passed the Diagnostic Test by scoring greater than or equal to 60 points, while only 3.8% (N = 5) successfully passed the Follow-Up Test. A decline was found in the students' vocabulary proficiency test scores. That is to say, the students scored less after using the program than before they used it.

Analysis of the Two sets of Test Scores by Independent Variables Analysis of the Test Scores by Academic Years
Firstly, the Year 2 students had a higher average score on the Diagnostic Test (mean value = 47.11) than the Year 1 cohort (mean value = 42.30). It was because the Year 2 students, in most cases, had more exposure to vocabulary learning and teaching than their Year 1 counterparts. However, neither group scored greater than or equal to 60 points on average. Next, a Mann-Whitney U test indicated no statistically significant difference between the students' Diagnostic Test scores and their academic years, since the p-value (0.372) was larger than 0.05. Secondly, the Year 1 students had better performances on the Follow-Up Test (mean value = 32.26) than the Year 2 group (mean value = 29.50). The Year 1 students could be more likely to follow the learning requirements. They could possibly be more hardworking and maintain better learning effectiveness, since they had not graduated from high schools for long. However, neither group scored greater than or equal to 60 points. Then, a Mann-Whitney U test presented no statistically significant difference was found between the students' Follow-Up Test scores and their academic years, since p-value = 0.098, which was larger than 0.05. It could be stated that the students' academic years have had no impacts on the two sets of test scores.

Analysis of the Test Scores by Genders
Firstly, the female students had a much higher score on the Diagnostic Test (mean value = 64.56) than their male counterpart (mean value = 43.61). The female students passed the test on average, while the male students did not do well. Female students generally showed better language learning outcomes and greater devotion to schooling. Next, a Mann-Whitney U test found no statistically significant difference existed between the students' Diagnostic Test scores and their genders, since the p-value (0.457) was larger than 0.05. Secondly, the female students had a much higher score on the Follow-Up Test (mean value = 31.33) than the male students (mean value = 30.37). Female students, again, contributed slightly better test performances. Both groups, however, did not pass the test. Mann-Whitney U test then showed no statistically significant difference between the students' Follow-Up Test scores and their genders, since the p-value (0. 769) was larger than 0.05. It can be observed that the students' genders have had no impacts on the two sets of test scores.

Analysis of the Test Scores by Academic Faculties/Disciplines
Firstly, the result showed the students from the School of Media and Communication had the highest mean score on the Diagnostic Test (mean value = 46.29) among the four academic faculties/disciplines, followed by the ones from the School of Chemistry (mean value = 46.18) and the School of Information Technology (mean value = 44.92). The students from the School of Architecture had the lowest mean score (mean value = 41.33). The reason was possibly because arts students generally had better performances in English study than science and engineering students. However, all four groups failed the test, since their mean scores were lower than 60 points. The Kruskal-Wallis test result on scores by academic faculties/disciplines indicated the students from the School of Information Technology had the highest test score (mean rank = 70.27). However, the students from the School of Architecture scored the lowest (mean rank = 61.71). The result indicated no correlation between the students' the Diagnostic Test scores and their academic faculties/disciplines (x2 = 1.151, df = 3, p-value = 0.765 > 0.05). Secondly, the students from the School of Information Technology scored the highest on the Follow-Up Test (mean value = 33.75) among the four academic faculties/disciplines, followed by the ones from the School of Chemistry (mean value = 31.82) and the School of Media and Communication (mean value = 9.916). However, the students from the School of Architecture had the lowest mean score (mean value = 28.51). The reason why Computer Science had better test scores could be that they were more familiar with and adept at using mobile technologies. They could thus take advantages of the program to its fullest. However, all students failed the test again since their mean scores were lower than 60 points. The result of the Kruskal-Wallis test on the scores by academic faculties/disciplines showed the students from the School of Information Technology had the highest test score (mean rank = 71.35). However, the students from the School of Architecture scored the lowest (mean rank = 61.62). There was no correlation between the students' Follow-Up Test scores and their academic faculties/disciplines (x2 = 1.401, df = 3, p-value = 0.705 > 0.05). It could be concluded that the students' academic faculties/disciplines have had no impacts on the two sets of test scores.

Paired-Sample t-Test Result of the Two Sets of the Test Scores
A paired-sample t-test was conducted to compare and analyze the 133 students' scores on each vocabulary proficiency test to verify whether a statistically significant difference existed between the two test scores. According to the paired-sample t-test result (as shown in Table 4 below), a statistically significant difference was found between the students' two sets of test scores, since the significance value = 0.000, which was smaller than 0.05. That is, the 133 students did perform statistically significantly on the two sets of tests. Additionally, the effect size was calculated to further measure the WALL program effect on the test scores. Cohen's d indicated the effect size magnitudes, namely d = 0.20 for a small effect, d = 0.50 for a medium effect, and d = 0.80 for a large effect [43]. It was reported the WALL program had a medium effect size of 0.50. That is, the WALL program has brought about medium differences to the students' test scores.

Discussions
The previous results initially indicated a number of potential influential factors/variables, including the students' academic years, genders, and academic faculties/disciplines, had no correlations with the test scores. More importantly, the students had statistically significantly different test scores. However, it was discouraging to see the decline in the test scores after the delivery of the WALL program. Therefore, it can be argued that the program was the only variable that has had negative impacts on the 133 students' test performances. That is to say, the students did not improve or retain their lexical proficiency, despite they had used the program for the period of three weeks during the pandemic. Contrary to the findings of Reference [24] which reported MALL positive effects by synthesizing 80 latest worldwide publications before the pandemic, the results of this particular study showed the MALL approach did not improve or retain students' lexical proficiency in the transition. Negative results, however, were not unusual according to previous evidence [44]. Several reasons could possibly explain the unsuccessful intervention of the WALL program: First, it was a challenge for a considerable number of students to embrace the new MALL approach. It was true that, before COVID-19, MALL pedagogies were welcomed [45], and mobile phones or smartphones were the most widespread mobile device for MALL practices [46]. However, according to the statistics shown on the WeChat public account, the students in this study hesitated to engage with the program. Explicitly, not all the students have viewed the learning materials and content, performed the daily practice and drills, or read the additional learning resources. This happened probably because they cast doubts on integrating WeChat with the regular teaching syllabus in the face of the unexpected transition. It was consistent with [6]'s findings that around half of the participants remained uncertain about effects of the new learning approach. Students would have a strong intention of conducting MALL activities, if only they perceived them to be pronouncedly useful [10].
Second, limited technical skills and knowledge prohibited students in using MALL tools. Owing to the large population of mobile phone users in China, it is hard to see a student without a mobile phone on university campuses [47]. The generation of digital nativities was familiar with and not irritated by using mobile devices for multifaceted purposes [48]. In addition, most Chinese university students would like to use mobile technologies for learning activities during the pandemic [49]. The subject university was equipped with on-campus high-speed network and stable Internet connectivity. Most of all, the program was designed while considering the costs of mobile data and challenges of network speed. However, certain students in this study were not familiar with fully applying the WeChat public account for vocabulary learning. They were possibly uncertain and confused about using the program to the fullest, ranging from reading learning materials in multiple forms, doing daily practice and drills, to viewing more learning resources. It could also be overwhelming and complex for some students to download texts, audios, and video clips from the program for future use. As mentioned previously, the vocabulary-learning content and materials were purposively designed in various forms. Language learning, however, could be interfered by redundancy and working memory load if students are exposed to the concurrent delivery of same learning information in different modes [50].
Third, students had difficulty concentrating on MALL activities. The students in this study could possibly devote to social chats when logging in the public account via WeChat. They would then get immersed in chatting with their friends rather than learning with the program. This happened since WeChat is a social application in which messages pop up automatically. Meanwhile, notifications and new information from other subscribed public accounts could also distract them from engaging with the program. When scrolling down the list of subscribed public accounts, the students' attention could be drawn by other appealing information, such as games. Since the learning activities in this study were carried out in an informal learning setting, the students were likely to be disturbed by any trivial matters. Undoubtedly, environments free from distractions were conducive to enhancing MALL outcomes [49].
Fourth, students had difficulties participating in MALL activities. As an indispensable element in educational practices, teachers had crucial impacts on students' learning performances [51]. In this study, the students experienced reduced engagement in the lexical learning activities when using the program in an informal/non-classroom learning setting. It happened because of the absence of language teachers who often supervise or remind students of learning tasks performed in formal/classroom learning environments. Students mostly conducted autonomous learning or self-directed learning during the pandemic [52]. That is, it required self-regulation and self-discipline from the students when using the program in absence of teacher-led instruction. However, as the statistics the public account presented, numerous students would not like to use the program autonomously until their language teachers asked them to.
Fifth, students' negative emotions could possibly have hindered the MALL effects. Students, when learning online, were likely prone to negative emotions, such as frustration, anxiety, learning slackness, and weariness [14]. In this study, the students mostly had to learn the lexical items and perform the learning activities on the program on their own. Therefore, they would possibly feel disconnected and distant resulting from limited interaction due to lack of physical proximity and isolation with peers. Such issues are common in informal online learning practices [53]. Moreover, the students had weaker intention and motivation in learning vocabulary by using the program, as their excitement for participating in MALL activities could be temporary [54]. Therefore, the students in this study would possibly have less satisfactory results when their enthusiasm decreased during the period of program delivery.

Limitations and Suggestions for Future Research
The study has several limitations. Firstly, considering the small sample size of the case study, the findings could be less representative regarding different situations at other Chinese universities. The authors would suggest that further studies should have a larger sample size and a wider geographical range, because it would be beneficial to provide a comprehensive picture of the researched topic. Since the present project was conducted in an academic university context, investigations in vocational colleges were also suggested. Secondly, since the university in this study allowed the students to return to the campus before the intervention started, the students used the program for lexical learning in a mixed learning mode. They might have conducted learning activities, using the program either in or out of the classroom. Findings could thus be different if the setting was completely formal or informal. The authors call for future studies that should consider different physical locations/environments under these circumstances. Thirdly, the present study selected the participants by using a nonrandomized sampling method. The authors would suggest that future studies can apply randomization for the participant recruitment to verify the research findings. Fourthly, the present paper primarily investigated the quantitative findings, since it was under a larger PhD project. The students' opinions and perceptions of using the program for MALL practices during the pandemic were explored in a separate paper of the authors, using a mixed-methods research design, including open-ended questionnaires and semi-structured interviews. The paper will be drafted and submitted soon. Fifthly, considering the fact that MALL approaches have been widely used for a broad range of language skills [55], the present study that has focused specifically on vocabulary calls for future studies to investigate different language teaching areas. Lastly, the intervention duration was around one month, owing to the time conflict with the final-exam week at the university. Since intervention durations have been found to impact the MALL effectiveness [56], future studies are expected to conduct a longitudinal empirical study with longer intervention durations.

Conclusions
The present study evaluated the effectiveness of a WeChat-based program in facilitating students' vocabulary learning outcomes at a Northern Chinese university. Two sets of English-language vocabulary tests were used to measure the students' lexical proficiency. The findings initially indicated that the independent variables, such as the students' academic years, genders, and academic faculties/disciplines, had no impact on their vocabulary test performances. Moreover, it was found that the students had a decline in their test scores after using the WALL program for a period approximately three weeks. The program has had a medium effect on the difference of the test scores. It can be stated that the effectiveness of MALL in the Chinese higher-education context remains debatable amid the pandemic.