Vocabulary Learning Based on Learner-Generated Pictorial Annotations: Using Big Data as Learning Resources

This research discusses the potential of using big data for vocabulary learning from the perspective of learner-generated pictorial annotations. Pictorial annotations lead to effective vocabulary learning, the creation of which is however challenging and time-consuming. As user-generated annotations promote active learning, and in the big data era, data sources in social media platforms are not only huge but also user-generated, the proposal of using social media data to establish a natural and semantic connection between pictorial annotations and words seems feasible. This research investigated learners’ perceptions of creating pictorial annotations using Google images and social media images, learners’ evaluation of the learner-generated pictorial annotations, and the effectiveness of Google pictorial annotations and social media pictorial annotations in promoting vocabulary learning. A total of 153 undergraduates participated in the research, some of whom created pictorial annotations using Google and social media data, some evaluated the annotations, and some learned the target words with the annotations. The results indicated positive attitudes towards using Google and social media data sets as resources for language enhancement, as well as significant effectiveness of learner-generated Google pictorial annotations and social media pictorial annotations in promoting both initial learning and retention of target words. Specifically, we found that (i) Google images were more appropriate and reliable for pictorial annotations creation, and therefore they achieved better outcomes when learning with the annotations created with Google images than images from social media, and (ii) the participants who created word lists that integrate pictorial annotations were likely to engage in active learning when they selected and organized the verbal and visual information of target words by themselves and actively integrated such information with their prior knowledge.


Introduction
Big data is defined as large-volume, high-velocity, and great-variety information assets that require advanced techniques and technologies for data collection, storage, management, and analysis [1,2]. Based on previous initiatives of defining big data, Gandomi and Haider [3] summarized five characteristics of big data, namely volume, velocity, variety, veracity, and value. Specifically, (1) volume refers to the magnitude of data, (2) velocity refers to the rate of data generation and the speed of data processing, (3) variety refers to the structural heterogeneity and complexity of data, (4) veracity refers to the unreliability inherent in some data sources, and (5) value refers to the attribute that large volumes of data accumulate high value despite of the low value density [3]. The attitude of the academic community towards the use of big data in education is positive in general [4]. It is widely agreed that the implementation of big data in education is associated with promising opportunities such as improved quality of academic programs, developed understanding of students' learning processes, customized teaching and learning for student needs, enhanced learning experiences, reduced dropout rates, and improved learning performance [5]. A number of researchers also believe that big data and learning analytics promise teachers and students a new age of education by assisting educators to collect information in real time, track learning processes, assess learning progresses, predict future performance, spot potential issues, and provide personalized instruction [6,7]. Furthermore, the use of big data in education has been extended to other sub-research areas. For example, the virtual reality (VR) and augmented reality (AR) techniques have been widely employed for providing immersive and interactive learning experiences for science education and language learning [8][9][10]. As suggested by Pikhart and Klímová [11], the recent advancement of artificial intelligence (AI) and big data techniques will facilitate the rapid transformation of e-learning systems and applications to a new era called eLearning 4.0.
In the field of language education and research, the emerging interest in big data and learning analytics has also led to great opportunities, one of which is personalization in language learning [4]. The other is the use of big data as learning resources for language education, for example, the use of videos, songs, movies, and social media. Social media is regarded as a platform to transform teaching and learning practices [12]. It allows students to learn regardless of time and space and integrates formal and informal learning [13]. It is the most attractive platform to acquire knowledge and information for Generation Z (i.e., learners born after the year 2000) [11]. It also serves as a learning tool for information sharing and collaborative learning [14]. Studies have examined the diverse use of social media tools in education aspects. It was found that social media platforms were effective learning tools for improving English for EFL teachers and students at Hong Kong secondary schools [15]. Similarly, Lai and Tai [16] found common use of self-initiated social media activities in language education, which can improve the learning motivation and language proficiency.
Many applications of big data in language education are associated with multimedia learning, and a considerable proportion of the practices of integrating multimedia into language learning resources is associated with annotations for vocabulary learning. The literature also indicates that multimedia annotations are very effective in promoting vocabulary learning [17]. In this study, we focus on pictorial annotations, which is a kind of multimedia annotations, and refer to the use of images from different sources (e.g., Google image search or social media platforms) for interpreting and depicting semantic meanings, connotations, and denotations of the target words. However, few practices and investigations of pictorial annotations for vocabulary learning involve students' proactive participation in the generation of pictorial annotations, although the practices of keeping word lists with textual annotations are common among language learners [18]. This indicates an inconsistency among the common practices of student-generated word lists with textual annotations and the widely acknowledged advantages of pictorial annotations. That is, on the one hand, the academic community emphasizes the effectiveness of pictorial annotations in promoting vocabulary learning and the importance of active learning; on the other hand, most language learners create their own wordlists with textual annotations, rather than pictorial annotations. We therefore asked learners to create pictorial annotations and investigated the effectiveness of learning with the pictorial annotations in the present project.
Specifically, we asked learners to search for images from Google images and social media for the annotation creation, as these two datasets are the two main sources of images. We asked two groups of learners to create pictorial annotations using these two image sources and another group of learners to evaluate these annotations. In this way, we analyzed students' perceptions of creating pictorial annotations with the two image sources and their evaluations of these two types of pictorial annotations. To investigate the effectiveness of the two types of pictorial annotations in promoting vocabulary learning, we also asked two groups of learners to learn with the two types of pictorial annotations, with a baseline of the vocabulary learning performance of a group of learners who learned with textual annotations. Thus, we investigated the application of big data in language Sustainability 2021, 13, 5767 3 of 17 education from the qualitative dimension in terms of learners' perceptions of using big data in creation of pictorial annotations and from the quantitative dimension in terms of learners' performance of vocabulary learning using pictorial annotations created with big data in this research. Our research questions are listed as follows.

1.
Which source of big data do students prefer for pictorial annotation creation, Google images or images from social media? What may be the possible explanations for such results? 2.
Which type of pictorial annotations do students highly rate, the annotations created with Google images or those with images from social media? What may be the possible explanations for such results? 3.
Which type of pictorial annotations promote better vocabulary learning, the annotations created with Google images or those with images from social media? What may be the possible explanations for such results?

Big Data in Language Learning
Big data has been commonly used to refer to the extremely rapid growth of data volume on the Web, although there is no universal and consistent definition of the term in the data science communities [19,20]. From a technological perspective, the main characteristics of big data are V3 (i.e., high Velocity, high Volume, and high Variety) as big data is "driven in part by social media and the Internet of Things (IoT) phenomenon" [21] (p. 293). Specifically, social media platforms such as YouTube, Twitter, and Facebook create a paradigm of user-generated data, while IoT technologies such as wireless communications, 5G technologies, and GPS have established the infrastructure for empowering the data access, generation, and organization. How do big data grow? "2.5 quintillion bytes of data are created and 90 percent of the data in the world today were produced within the past two years" [22] (p. 97). More recently, the value of big data has been acknowledged and further applied to various domains including healthcare [23], manufacturing [24], digital media [25], the e-learning environment, [26] and so on.
The research on big data in language education can be mainly categorized into two classes. One category of research studies focuses on learning analytics for language learners, the data of which include learning behavioral data, learning logs, learner profiles, learning preferences, performance, and so on. For example, Hsiao, Lan, Kao, and Li [27] visualized the learning paths by collecting the data of learning processes from college students who were studying the Mandarin vocabularies in the virtual world. Zou and Xie [28] have developed a personalized vocabulary learning system for facilitating the recommendations of learning tasks through a learner profile established according to the technique feature analysis theory. Another category of research studies aims to integrate big data platforms and/or big data sources into the language learning processes. For example, Liu and Lan [29] examined the interaction patterns of negotiation on Facebook between EFL learners and native speakers. Xie, Zou, Lau, Wang, and Wong [30] identified the preferred topics of language learners in their social media platforms for generating personalized vocabulary tasks. Pikhart and Klímová [11] studied eLearning 4.0, which utilized the state-of-the-art big data and AI techniques, and used this new paradigm in second language acquisition for Generation Z language learners.

Google and Social Media
Google and social media platforms are quite different in terms of information creation and retrieval. Google employs a crawler to collect information sources from millions of websites, organizes information sources in the database, and extracts key features for establishing indices for keyword searches. Essentially, the core idea of a Google search algorithm like PageRank is to identify the most authoritative information sources for responding to an issued query [31]. In contrast, social media platforms allow users to create, share, and disseminate their own content, including individual opinions, life stories, and so on. The information sources at the platforms are personalized and subjective [32]. Therefore, the information retrieval in the social media platforms aims to access the highly personalized and interested information [33,34]. Due to their differences in information retrieval and creation, we investigate and compare the impact of the use of images from Google and social media platforms as pictorial annotations for vocabulary learning in this study.

Multimedia Annotations for Vocabulary Learning
The importance of annotations (or glosses) for effective vocabulary uptake from reading has been widely acknowledged in the research community of second language vocabulary acquisition [35,36]. With the advent of technology-enhanced language learning, annotations have also developed from the traditional format, which involves only short textual definitions that aim to facilitate reading comprehension, to multimedia formats that present various aspects of word knowledge through "different modalities (textual, visual, and auditory) and modes (video, picture, and text)" [37] (p. 136). Researchers in general argue that pictorial annotations, compared to textual annotations, are more effective in enhancing reading comprehension, promoting students' uptake of word meanings, and meeting learner needs and preferences [38][39][40][41]. Specifically, Chun and Plass [42] found that it was easier to learn words associated with images or actual objects than words without. Jones and Plass [43] also reported that dual annotations, which involve both text and images, were significantly more effective than single annotations that involve only text or images. In addition to the two basic types of annotations, textual only and textual plus pictorial annotations, Yeh and Wang [44] further investigated textual plus pictorial plus audio annotations and compared these three annotation types, the results of which showed that textual plus multimedia annotations were the most effective. Similarly, Turk and Ercetin [45] emphasized that presenting visual and verbal information simultaneously led to better learning than allowing students to select the types of annotations they want. Interestingly, studies that compared the annotations that integrate static pictures to the annotations that involve dynamic animations revealed controversial findings. Ikeda [46] and Lin and Chen [47] found annotations with dynamic pictures more effective, yet Ariew and Ercetin [48] argued the opposite. Additionally, Boers et al. [17] found that pictorial annotations induced more attention from learners to target words and hence promoted better learning.
The large number of studies on the benefits of multimedia (text plus static pictures) annotations also induces numerous attempts that aim to explain the effectiveness of multimedia annotations, such as Mayer's generative theory of multimedia learning [49][50][51][52], Schmidt's noticing hypothesis [53][54][55], and Craik and Lockhart's levels of processing theory [56], three of the most frequently referred to theoretical frameworks. Although many studies have been conducted to investigate the pictorial annotations provided by teachers or material designers, little is known about learner-generated pictorial annotations. Additionally, language learners are often encouraged to keep notes of unfamiliar words that they encounter while reading and go over them periodically to obtain a strong and durable memory trace [18]. As one of the most widely adopted word learning strategies, such a practice of creating personalized word lists, however, does not normally involve multimodality. Word lists are usually only comprised of the target words and simplified definitions, with sample phrases or sentences being included sometimes. Few learners try to, or are advised to, integrate images into their word lists, even though the effects of pictorial annotations on vocabulary acquisition have been widely acknowledged. There is therefore a call for research that examines learner-generated word lists that integrate pictorial annotations.

Research Design
Motivated by previous research studies on big data in language learning, Google and social media, and multimedia annotations for vocabulary learning as introduced in Section 2, we can find that theories about big data, language acquisition, and multimedia learning form the methodological background of this study. Moreover, as many studies in Section 2 used the mixed research method, which integrated the quantitative and qualitative data, this method was employed in this study.
A total of 153 undergraduates, who were speakers of Chinese and learned English as a foreign language, participated in the research. They were upper intermediate learners according to the common European framework of reference for languages. These students were randomly assigned into five groups and completed different learning tasks. Their ages ranged from 18 to 22 and they had learned English for around 8 years on average. They were all non-English majors in a local university in Hong Kong. Among these participants, 90 of them were female students and were equally and randomly assigned to the five groups. The experiment was conducted at a local university in Hong Kong in 2019.
The 30 students in Group 1 (the annotation-creation group) created two sets of multimedia-annotations for 10 target words, one set using images from social media data, and the other Google data. The target words included burglarize, grin, inflammation, ostensible, procrastination, rake, shatter, shiver, tumble, and wrath. These words are all tangible to the senses and can be easily imagined. Moreover, according to Davis's Corpus of Contemporary American English (COCA) [57], these words are out of the 6000 most frequently used words; thus, they are unlikely to be unknown to the participants who are intermediate English learners and know approximately 3000 words. The textual annotations for the words were provided in the worksheets, and the participants searched for appropriate and accurate images that depicted semantic meanings of the words from Google images and three social media platforms including Twitter, Instagram, and Facebook. In sum, 600 learner-created pictorial annotations were created, with each participant generating 10 annotations for the 10 target words using Google data and 10 annotations using social media data. These participants were interviewed about their experiences of annotation creation and image search afterwards. Guided questions included whether it was easy to search for images that appropriately depicted target word meanings, and whether the images presented by the data sources were relevant, reliable, appropriate, and interesting. The participants were also encouraged to share their overall perceptions, feelings, and experiences freely. The interviews were audiotaped to record all details.
To examine to what extent the images searched from Google Images and social media depict the meanings of the target words vividly, a group of 30 participants evaluated the 600 learner-created pictorial annotations and rated them from 1 (very inappropriate and inaccurate) to 5 (very appropriate and accurate). These participants in Group 2 (the annotation-evaluation group) were trained to evaluate the appropriateness and accuracy of pictorial annotations before conducting the evaluation by two experts in English language education. They were also interviewed for the investigation of their evaluation of the pictorial annotations.
Based on the scores of the leaner-generated pictorial annotations, as given by the participants in Group 2, the two experts selected two sets of pictorial annotations for the 10 target words. One set of pictorial annotations were created using Google images, and the other using social media images. The selection criterion is that the set of pictorial annotations had the highest score from the two perspectives, with appropriateness and accuracy in depicting the meanings of the target words.
To evaluate the effectiveness of these two sets of pictorial annotations in promoting vocabulary learning, two groups of participants were asked to learn with the annotations, and their learning outcomes were measured. There were 31 participants in Group 3 (the experiment group who learned with Google pictorial annotations) and 32 in Group 4 (the experiment group who learned with social media pictorial annotations).
As a control group, another group of 30 students learned with simple textual annotations, which were the same as those provided to the participants in Group 1 and used by those in Group 3 and 4. All three groups of participants learned the target words by reading a text in which the 10 target words were integrated and learned the words with associated annotations. The annotations for Group 5 (the control group) were simple textual annotations, those for Group 3 were Google pictorial annotations, and those for Group 4 were social media pictorial annotations.
The participants decided how they would like to search and select the images, and the search and selection procedures were fully dependent on their preferences. They were encouraged to freely search images by issuing the queries to Google (or the search portals in the social media platforms, depending on their assigned groups), and then select the images that were best to match and express the meaning of a target word.
These three groups of participants' prior knowledge of the target words was measured through a pre-test before the intervention, and their initial learning and retention of the words were measured through an immediate post-test right after the intervention, and a delayed post-test 1 week later. Additionally, these participants were interviewed about their experiences of learning the target words with the annotations.

Assessment and Scoring
We conducted a pre-test to evaluate the participants' prior knowledge of the target words and immediate and delayed post-tests to examine the development of their word knowledge. These participants were from Group 3 (the experiment group who learned with Google pictorial annotations), Group 4 (the experiment group who learned with social media pictorial annotations), and Group 5 (the control group who learned with textual annotations). To evaluate their development of receptive and productive knowledge of the meanings of the target words after learning with the pictorial annotations, we applied Folse's modified version [58] of Paribakht and Wesche's vocabulary knowledge scale [59] as the assessment tool and adopted Zou's scoring system [60,61]. A meaning was graded zero if it is incorrect and a full score if it is correct. A sentence was graded zero if it is used semantically incorrect, a half score if it is semantically correct but grammatically incorrect (e.g., 'Don't procrastination. It's harmful'.), and a full score if it is semantically and grammatically correct (e.g., 'Procrastination can lead to feelings of guilt'). Two trained raters scored the answers independently, and the Pearson's r for the inter-rater reliability was 0.99 for the pre-test, 0.93 for the immediate post-test, and 0.95 for the delayed post-test.

Collection and Analysis of Interview Data
We collected three types of interview data in this research. Firstly, we interviewed the participants in Group 1 (the annotation-creation group) to collect data of their experiences of annotation creation and image search. The guided questions for the interview focused on what the participants thought about the idea of using Google data and social media data as resources for language enhancement, what they felt about searching for images from Google data and social media data, and how they perceived the images as presented by Google images and social media.
We also collected data of the participants' evaluation of the learner-generated pictorial annotations created with social media data and Google data through interviewing the participants in Group 2 (the annotation-evaluation group) and the two experts in English language education. The guided questions for the interview focused on the general appropriateness and accuracy of the images from Google data and social media data for depicting the meanings of the target words.
Moreover, we collected data of the participants' experiences of learning the target words with different types of annotations through interviews after the delayed post-test. Nine participants from Group 3 (the experiment group who learned with Google pictorial annotations), Group 4 (the experiment group who learned with social media pictorial annotations), and Group 5 (the control group who learned with textual annotations) were Sustainability 2021, 13, 5767 7 of 17 interviewed, with three from each group. The guided questions for the participants in the two experimental groups focused on whether the pictorial annotations facilitated their learning, and how they felt about learning with the images from Google data and social media data.
The sample guided questions are listed in Appendix A. We transcribed the interview data verbatim first and then analyzed them to understand features of Google data and social media data. We then conducted text analysis for the transcribed texts by finding evidences the common perceptions, feelings, and opinions from the participants about learning procedures. Following Zou's previous study [58], all data were skimmed first to note distinct features, focusing on which relevant literature were reviewed. This process is cyclical, throughout which we repeatedly re-examine the data and literature.

Perceptions of Google Data and Social Media Data for Multimedia-Annotations Creation
The interview results of the 30 students in Group 1 (the annotation-creation group) showed five main perceptions of the participants concerning the use of Google data and social media data for multimedia-annotations creation (see Table 1). Firstly, over 90% of participants considered the idea of using Google and social media data as resources for language enhancement interesting and creative, and secondly, over 80% of them considered it feasible and reliable. Other positive comments concerning the use of Google and social media data for language education, which appeared very frequently in the interview transcripts, included "I like this way of learning" and "it's awesome". One participant also suggested, "Learning should be integrated into life, and if social media can be well used for learning, it will be very helpful, especially for learners with low motivation". Such results indicated that the participants were aware of the potential benefits of using big data for language education. Table 1. Participants' perceptions of Google data and social media data.

Google Data Social Media Data
The idea of using this data set as resources for language enhancement is interesting/creative/fun. Moreover, the participants generally agreed that it was easy to create pictorial annotations using Google and social media data. Specifically, almost all participants found it easy to create pictorial annotations through searching for images that appropriately depict target word meanings from Google data, while a lower percentage of them felt it easy to do so using social media data (slightly below 70%). Furthermore, over 85% of participants considered the images presented by Google images relevant to the keywords that they entered for search, but lower than 65% of participants felt so while searching for images from social media data. Additionally, over 75% of participants considered Google data appropriate, yet only 40% of the participants described social media data as such. One participant commented, "I can see clear differences between Google images and social media images; Google is more reliable, and social media is very personal". These results indicated that overall, the participants showed positive attitudes towards developing language learning resources using Google and social media data, but they considered Google data more appropriate, relevant, and easy to use than social media data.

Scores of Learner-Generated Google Pictorial Annotations and Social Media Pictorial Annotations
The descriptive statistics for the scores of learner-generated annotations, as displayed in Table 2, demonstrated that the mean score of Google pictorial annotations (M = 4.21, SD = 0.67) was higher than that of social media pictorial annotations (M = 3.42, SD = 0.70). To further test whether Google pictorial annotations was significantly better than social media pictorial annotations, an independent samples test was applied, the results of which indicated statistically significant differences (t = 14.04, df = 598, p < 0.001, r = 0.49), as presented in Table 3. Such results indicated that the participants considered the learnergenerated Google pictorial annotations much more appropriate and accurate than social media pictorial annotations. It is also noteworthy that the interview results of the 30 students in Group 1 (the annotation-creation group), the 30 participants in Group 2 (the annotation-evaluation group), and the two experts in English language education all indicated differences between the annotations for the six verbs and those for the three nouns and one adjective. This is perhaps because the meanings of the verbs (i.e., burglarize, grin, rake, shatter, shiver, and tumble) are more concrete than those of the nouns and adjectives (i.e., inflammation, procrastination, wrath, and ostensible) in this research. To test this hypothesis, we examined the scores of the learner-generated annotations for the verbs and the nouns and adjective. The results, as shown in Table 4, showed great differences, and the scores of the annotations for the verbs were much higher (M = 4.19, SD = 0.67) than those of the annotations for the nouns and adjective (M = 3.25, SD = 0.62). As a follow-up test of whether significant differences existed between the scores of the annotations for verbs and those for the nouns and adjective, we conducted another independent samples t-test, the results of which showed statistical significance (t = 17.22, df = 598, p < 0.001, r = 0.58) (see Table 5). That is, the participants considered the learner-generated annotations for words with concrete meanings much more appropriate and accurate than those for words with abstract meanings. Table 4. Scores of the learner-generated annotations for verbs and nouns and adjective.

Learning Performance of the Participants Who Learned with Textual Annotations, Google Pictorial Annotations, and Social Media Pictorial Annotations
The descriptive statistics, as shown in Table 6, demonstrate three main findings. Firstly, all three types of annotations were conducive to effective learning of the target words from both perspectives of immediate learning (the means of the immediate post-tests were 8.34, 7.23, and 6.09, respectively) and delayed retention (the means of the delayed post-tests were 6.84, 5.50, and 3.93, respectively), considering that the participants' pre-knowledge of the words was close to zero. Secondly, the participants who learned with pictorial annotations achieved better learning outcomes than those who learned with textual annotations, as indicated by their scores on the immediate and delayed post-tests. Thirdly, the participants who learned with Google pictorial annotations had the highest scores in both immediate and delayed post-tests. Further, we conducted two one-way ANOVA tests, which were widely used in other similar studies [51,52] to examine whether any significant differences existed among the effectiveness of the three annotation types in promoting vocabulary learning. The data met all assumptions for performing ANOVA.
Concerning the participants' scores in the immediate post-test, the significance values of the tests of normality of the data were 0.07, 0.053, and 0.07, respectively, which were greater than 0.05 and indicated that the data were normally distributed. The data also met the requirements of homoscedasticity, homogeneity of variances and regression slopes, etc. As presented in Table 7, the results of the one-way ANOVA test of the participants' scores in the immediate post-test showed statistically significant differences among the three groups (F = 13.07, p < 0.001). Table 8 also shows that Google pictorial annotations promoted significantly more effective vocabulary learning than both social media pictorial annotations (p = 0.02) and textual annotations (p < 0.001). The differences between the effectiveness of social media pictorial annotations and that of textual annotations were also statistically significant (p = 0.04). In sum, the mean differences among the three groups were significant at the 0.05 level. Concerning the participants' scores in the delayed post-test, the significance values of the tests of normality of the data were 0.058, 0.06, and 0.12, respectively, indicating that the data were normally distributed. The data also met the requirements of homoscedasticity, homogeneity of variances and regression slopes, etc. As presented in Table 9, significant differences were identified among the three groups (F = 15.32, p < 0.001). The results shown in Table 10 also suggested that after 1 week, the participants who learned with Google pictorial annotations still performed significantly better than those who learned with social media pictorial annotations (p = 0.03) and textual annotations (p < 0.001). In addition, the participants who learned with social media pictorial annotations performed significantly better than those who learned with textual annotations (p = 0.01). In sum, the mean differences among the three groups were significant at the 0.05 level. The participants in Group 3 (the experiment group who learned with Google pictorial annotations) and Group 4 (the experiment group who learned with social media pictorial annotations) also reported in the interview that the Google and social media images were conducive to their learning of the target words.
In summary, the results of this research indicated that the participants considered the idea of using Google and social media data sets as resources for language enhancement interesting and creative. They also felt that Google data were more appropriate, relevant, and easy to use than the social media data. The scores of the learner-generated Google pictorial annotations and those of the social media pictorial annotations, as given by the participants in the annotation-evaluation group, also showed that the participants considered the learner-generated Google pictorial annotations much more appropriate and accurate than social media pictorial annotations. Moreover, the participants' learning performance indicated that Google pictorial annotations were significantly more effective than social media pictorial annotations in promoting learners' initial learning and retention of the target words.

Perceptions of Using Google Data and Social Media Data for Learning Resource Creation
We found that the participants in the annotation-creation group considered Google data more appropriate, relevant, and easy to use than social media data in this research. The participants in the annotation-evaluation group also rated the annotations created with Google images higher, compared to those created with social media images. A possible reason for such results is that social media data are more subjective. Typical social media platforms (e.g., Facebook, Instagram, Twitter, blogs, and YouTube) are all designed to allow people to share their personal feelings, ideas, and life experiences, so the social media data are closely associated with the context in which the users create them, the correct interpretation of which requires comprehensive consideration of the many factors that are related to it. De-contextualization of social media images tends to result in loss of meanings or misinterpretation, as the situations, events, feelings, or information related to the images all contribute to the expression of particular meanings in certain contexts. Thus, when social media images are used in other contexts that are different from the original contexts where the images are created, the information may not be conveyed accurately. However, this is common when social media data are used for learning resource creation. Moreover, different users may interpret an image from varied perspectives. The same image may seem appropriate to some learners for the expression of a word's meaning, but inappropriate to others. For example, some participants considered the following pictorial annotations, which were created by a participant using an image from Twitter in our research, to be very accurate and appropriate, yet some felt they were very inaccurate and inappropriate (see Figure 1). curately. However, this is common when social media data are used for learning resource creation. Moreover, different users may interpret an image from varied perspectives. The same image may seem appropriate to some learners for the expression of a word's meaning, but inappropriate to others. For example, some participants considered the following pictorial annotations, which were created by a participant using an image from Twitter in our research, to be very accurate and appropriate, yet some felt they were very inaccurate and inappropriate (see Figure 1). ostensible (adj.) seeming to be true, but not necessarily so The image is from https://twitter.com/hashtag/ostensible?lang=cs (accessed on 4 October 2019) Compared to the social media images, the participants showed more positive attitudes towards Google images, considering them less subjective and more reliable. This is likely because the Google ranking systems consist of a whole series of algorithms, through which they can sort billions of data to identify the most relevant results, with factors such as webpage relevance and usability, source expertise, content quality, context, and setting being taken into account [59]. Thus, Google images tend to be more independent from the contexts of the web pages from where the images originate compared to social media images. The participants therefore felt that Google images were more appropriate and reliable for pictorial annotations creation, and they achieved better outcomes when learning with the annotations created with Google images.
Another noteworthy finding is that the learner-generated annotations for words with concrete meanings were much more highly rated than the annotations for words with abstract meanings. This indicates that it is easier to create annotations for words with concrete meanings than those with abstract concepts, and the multimedia-annotation-enhanced vocabulary learning approach is more appropriate for the learning of words with concrete meanings. We also noticed in this research that some participants tried to concretize words of abstract concepts through additional self-generated explanations in the textual glosses, an example of which is presented in Figure 2. This facilitates learners' contextualization of the abstract concepts and comprehension of the word meanings to some extent. Compared to the social media images, the participants showed more positive attitudes towards Google images, considering them less subjective and more reliable. This is likely because the Google ranking systems consist of a whole series of algorithms, through which they can sort billions of data to identify the most relevant results, with factors such as webpage relevance and usability, source expertise, content quality, context, and setting being taken into account [59]. Thus, Google images tend to be more independent from the contexts of the web pages from where the images originate compared to social media images. The participants therefore felt that Google images were more appropriate and reliable for pictorial annotations creation, and they achieved better outcomes when learning with the annotations created with Google images.
Another noteworthy finding is that the learner-generated annotations for words with concrete meanings were much more highly rated than the annotations for words with abstract meanings. This indicates that it is easier to create annotations for words with concrete meanings than those with abstract concepts, and the multimedia-annotationenhanced vocabulary learning approach is more appropriate for the learning of words with concrete meanings. We also noticed in this research that some participants tried to concretize words of abstract concepts through additional self-generated explanations in the textual glosses, an example of which is presented in Figure 2. This facilitates learners' contextualization of the abstract concepts and comprehension of the word meanings to some extent.

Effectiveness of Google Pictorial annotations and Social Media Pictorial annotations in Promoting Language Learning
The results of this research showed that pictorial annotations were significantly more effective than textual annotations in promoting learners' initial learning and retention of the target words. This aligns with the multimedia principle, which argues that people can

Effectiveness of Google Pictorial Annotations and Social Media Pictorial Annotations in Promoting Language Learning
The results of this research showed that pictorial annotations were significantly more effective than textual annotations in promoting learners' initial learning and retention of the target words. This aligns with the multimedia principle, which argues that people can learn more deeply from text and images than from text alone [60]. It is also found that Google pictorial annotations were significantly more effective than social media pictorial annotations. This is likely because the Google pictorial annotations can better assist learners to understand the meanings of the target words, as Google images depend less on the contexts of the web pages from where the images originate. Such findings indicate that the effectiveness of pictorial annotations in promoting learning are associated with the quality of the annotations, one indicator of which is the extent to which the images can depict meanings of the target words accurately and appropriately. Given the difficulty of depicting words of abstract concepts, the strategy of concretizing these words through additional self-generated contextualized explanations in the textual glosses seems feasible. Many participants in this research did apply this strategy in annotation creation and found it helpful. The participants who evaluated the learner-generated annotations also commented that the annotations, which are associated with contextualized explanations of the images, could convey meanings of the target words more accurately and seemed more conducive to effective learning.

Potential of Using Big Data as Language Learning Resources
The learners were aware of the potential of using big data for language education in general. Most participants in this research considered the idea of using Google and social media data sets as resources for language enhancement interesting, creative, and feasible. Many of them also mentioned in the interview that images could facilitate them to visualize the target words and learn better. Moreover, many learners have the habit of creating their own word lists, the words in which are normally the unfamiliar words that they have encountered in reading [18]. Therefore, it seems feasible to ask learners to create multimedia word lists rather than textual word lists, taking into account the results of this research, which indicate the participants' interests in using Google and social media images to create pictorial annotations.
Additionally, active word learning happens when learners proactively connect verbal and visual information of target vocabulary with their prior knowledge and create their own lexical associations [49,50,62]. The participants who created word lists that integrate pictorial annotations were likely to engage in active learning when they selected and organized the verbal and visual information of target words by themselves and actively integrated such information with their prior knowledge. These features are unique for the learning activities that involve learner-generated pictorial annotations and play an important role in promoting effective learning. The research results also showed that the participants repeatedly evaluated what images better matched the target words, and no one mentioned anything like heavy cognitive load or unaffordable mental efforts. Thus, the potential of using big data as language learning resources is significant.

Challenges of Using Big Data as Language Learning Resources
The quality of big data is unstable, leading to challenges of effective use of big data for high quality education. The results of this research showed that the scores of Google pictorial annotations were higher than social media pictorial annotations, and the learners who learned with Google pictorial annotations achieved better learning outcomes than those who learned with social media pictorial annotations, indicating that the quality of learning resources has influence on the learning outcomes. Moreover, the interview results of this research showed that the participants were aware of the possible influences of the data quality on learning outcomes. Many participants reported concerns that their annotations may not be professional enough to be qualified learning resources. Thus, because of the unstable quality of big data and importance of the quality of learning resources, quality assurance is essential for the use of big data as language learning resources. However, it is difficult, if not impossible, to control the quality of user-created big data, and it is consequently challenging to assure the quality of learner-created pictorial annotations. The underlying reason is that, as discussed in Section 2.2, Google is used to identify the most authoritative information sources for responding to an issued query, whereas social media platforms mainly access and share the highly personalized and interested information.

Conclusions
We investigated the use of big data as learning resources for language education in this research from three perspectives-learners' perceptions of creating pictorial annotations using Google images and social media images, learners' evaluation of the learner-generated pictorial annotations, and the effectiveness of Google pictorial annotations and social media pictorial annotations in promoting vocabulary learning. The results indicated positive attitudes towards using Google and social media data sets as resources for language enhancement, as well as significant effectiveness of learner-generated Google pictorial annotations and social media pictorial annotations in promoting both initial learning and retention of target words. Specifically, we have found that (i) Google images were more appropriate and reliable for pictorial annotations creation, and therefore they achieved better learning outcomes than annotations created with images from social media, and (ii) the participants who created word lists that integrate pictorial annotations were likely to engage in active learning when they selected and organized the verbal and visual information of target words by themselves and actively integrated such information with their prior knowledge.
One important implication of this study is that a computer-assisted tool is necessary to help teachers and learners identify the relevant resources according to their preferences for creating annotations for vocabulary as there has been many irrelevant and noisy resources provided to the learners by Google and social media platforms. Another implication is that the learner engagement is an essential factor for developing learning tasks with big data. The learner-generated pictorial annotation is a good example for engaging learners, which involves a deep level of processing target words by searching, filtering, comparing, and ranking candidate pictures.
This research is limited in that it only investigated the use of big data for vocabulary learning, rather than the general development of language skills. However, vocabulary knowledge is the fundamental element of language knowledge, so the effectiveness of big data for vocabulary knowledge development can, to some extent, indicate the potential of the effectiveness of big data for language enhancement. Moreover, this research focused on the use of big data for the creation of pictorial annotations, while big data can be applied in many other language learning areas through a wide range of approaches; thus, more in-depth understanding of the potential of big data for language education ought to involve a larger number of studies from more varied viewpoints. Moreover, the current study does not provide a feasible solution for improving data quality of social media pictorial annotations. It is feasible to define a conceptual framework for both learners and teachers for selecting high-quality learning resources. In addition, the adoption of hardware devices including mobile phones [63], wearable devices [64] and so on can be an important factor for further investigation. Future research is therefore advised to be conducted in such directions.  To what extent do you agree that it is easy to search for images that can appropriately depict the target word meanings? 2.
To what extent do you agree that the images presented by the data sources are relevant, reliable, appropriate, and interesting? 3.
To what extent do you agree that the images from Google data and social media data can depict the meanings of the target words accurately and properly?

4.
To what extent do you agree that the pictorial annotations can facilitate your learning of the target vocabulary? 5.
What factors or features related to the pictorial annotations do you find useful for your learning of the target vocabulary? 6.
What are your feelings, perceptions, or experiences about the learning process? 7.
To what extent are you satisfied with the learning approach? 8.
Do you have any other comments on the learning process and approach?