Next Article in Journal
Cognate Facilitation in Child Third Language Learners in a Multilingual Setting
Next Article in Special Issue
The Influence of the L1 on L2 Collocation Processing in Tamil-English Bilingual Children
Previous Article in Journal
¿Soy de Ribera o Rivera?: Sociolinguistic /b/-/v/ Variation in Rivera Spanish
Previous Article in Special Issue
Size Matters: Vocabulary Knowledge as Advantage in Partner Selection
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

TED Talks and the Textbook: An In-Depth Lexical Analysis

by
Naheen Madarbakus-Ring
1,* and
Stuart Benson
2
1
CEGLOC, Institute of Humanities and Social Sciences, University of Tsukuba, Tsukuba 305-8577, Japan
2
Center for Language Research, University of Aizu, Aizuwakamatsu 965-0006, Japan
*
Author to whom correspondence should be addressed.
Languages 2024, 9(10), 309; https://doi.org/10.3390/languages9100309
Submission received: 29 April 2024 / Revised: 8 September 2024 / Accepted: 10 September 2024 / Published: 24 September 2024

Abstract

:
The development of TED Talks textbooks has been a welcoming addition to English for Academic Purposes (EAP) pedagogy. The textbooks offer educators and learners a suitable framework for practicing all four of the language skills (i.e., listening, reading, speaking, and writing). However, the use of TED Talk resources could create specific vocabulary challenges for learners as they progress through each unit in the textbook. Research suggests that although textbook frameworks encompassing listening resources benefit learners with a familiar lesson approach, the varying vocabulary load and the presence of academic vocabulary and multiword units (MWUs) presented between the chosen resources and the textbook itself could lead to comprehension difficulties for learners. This study investigates the vocabulary of 12 TED Talks included in the commercial textbook Keynote 2 to understand the lexical profile, vocabulary load, and the academic and multiword unit coverage for each of the chosen listening texts. The results showed that the TED Talks selections and the textbook provided inadequate vocabulary practice, limited academic vocabulary exposure, and a lack of item repetition for learners. The study suggests the inclusion of ideal supplementary materials and appropriate TED Talk selections to help provide educators with suitable guidance to support their learners’ varying vocabulary knowledge.

1. Introduction

In English Language Teaching (ELT), the textbook provides educators with suitable frameworks, guidance, and language when teaching English for Academic Purposes (EAP) courses. Specifically, the textbook provides learners with a primary source of vocabulary that is integrated into the textbook content to help them maximize their learning. Sun and Dang (2020) noted the potential of using textbook lessons to help learners notice and evaluate the most useful words needed for their language learning. Recently, the development of commercial textbooks using resources such as TED Talks (i.e., Keynote and World English) has emphasized the need to understand the lexical demands of these standalone resources. Given that textbooks provide well-developed and structured instruction for educators, it remains unknown whether these additional resources are lexically appropriate for learners when completing the units’ listening activities.
L2 research has now turned to investigating the suitability of authentic resources in listening to understand the chosen input included in various textbooks. Morrow (1977) defined authentic resources as “a stretch of real language produced by a real speaker or writer for a real audience and designed to carry a real message of some sort” (p. 270). As a result, publishers have commissioned textbook writers to include authentic resources in their selections. For example, Cengage Learning has published five textbook titles, which include TED Talks as the language input (see http://ngl.cengage.com/ted, accessed on 10 September 2024). Elk (2014) advocated the suitability of TED Talks as a listening resource for learners, drawing on the similarities between the lecture-style components of lecturers and presenters and their use of PowerPoint visuals to explain concepts. Given the popularity of these resources, it is surprising that few studies have examined the suitability of TED Talks for listening input.
Specifically, vocabulary in authentic resources could be problematic for learners. Schmitt and Schmitt (2014) highlighted the importance of measuring the lexical coverage of items by their frequency bands (i.e., high/mid/low frequency). In turn, recent studies (see Benson and Madarbakus-Ring 2021; Sun and Dang 2020; Yang and Coxhead 2020) have used Nation’s (2012) BNC/COCA 25,000 lists to analyze vocabulary by its frequency level. Similarly, studies using the academic word list (AWL) (Coxhead 2000) have investigated academic vocabulary in TED Talks, with results indicating a suggested coverage of 4% (see Coxhead and Walls 2012; Nurmukhamedov 2017). However, few studies have considered the representation of multiword units (e.g., lexical chunks that have one or more words) in texts. Identifying the potential difficulty of such lexical items, categorized by their generality (i.e., high-/mid-/low-frequency items) or specialism (i.e., academic, technical, or multiword units) in texts, could help teachers provide learners with the tools they need to address their comprehension difficulties more successfully. Therefore, this study investigates the lexical profile, vocabulary load, academic, and multiword unit coverage of the 12 chosen TED Talks used in the Cengage Learning textbook, Keynote 2. The analysis conducted considers the lexical frequency, vocabulary difficulty, and the coverage of different word types (i.e., academic and multiword units), and examines the suitability of these resources for vocabulary learning when listening to the chosen TED Talks in textbooks.

2. Literature Review

Previous studies have investigated the vocabulary load (Sun and Dang 2020), word frequency (Matsuoka and Hirsh 2010), and vocabulary knowledge (Yang and Coxhead 2020) that learners need to understand vocabulary input in commercial textbooks. A common approach to investigating the vocabulary load in textbooks is using West’s (1953) General Service List (GSL). The GSL word list consists of the first 2000 word families to help researchers identify high-frequency words in a text. Schmitt and Schmitt (2014) also recognized the need to identify mid-frequency and low-frequency words, suggesting broader investigations that examine the lexical coverage of texts as needed. As a result, studies have used Nation’s (2012) British National Corpus/Corpus of Contemporary American English (BNC/COCA) word lists that contain the 25,000 most frequent words for more comprehensive investigations of high-/mid-/low-frequency words in textbooks. From these approaches, understanding the lexical coverage and the word frequency included in textbooks can help to support language learners.

2.1. Vocabulary Load in Textbooks

Nation (2006) describes the lexical coverage in texts as “the percentage of running words in the text known by learners” (p. 61). Numerous studies (see Benson and Madarbakus-Ring 2021; Laufer 1992; Van-Zeeland and Schmitt 2013) have investigated the vocabulary load of texts using two lexical coverage thresholds: 95% and 98% coverage. In listening activities, Van-Zeeland and Schmitt (2013) noted that a 95% threshold indicates “good but not necessarily complete” comprehension for learners, while a 98% threshold represents “very good comprehension” (p. 475). In textbooks specifically, five studies used Nation’s (2012) BNC/COCA 25,000-word list to investigate the vocabulary load to understand their lexical coverage. The results are shown in Table 1.
Table 1 shows the levels of comprehension learners need in their knowledge of word families (i.e., all words related to the root word are counted as one) to understand each of the commercial textbooks. Warnby (2023) acknowledged that although previous studies have fluctuated to show varying thresholds between 86.7% and 96.7%, an understanding of between 95% and 98% of words is frequently chosen as the essential threshold to measure learners’ vocabulary knowledge. Specifically, all of these studies suggest that 95% and 98% coverage thresholds show that learners need knowledge of between 2000 and 4000 word families for 95% comprehension of the lexical coverage. Although Hajiyeva’s (2015) analysis of university textbooks and Yang and Coxhead’s (2020) New Concept English analysis showed that learners required higher lexical coverage thresholds of around 4000 word families, understanding the requirements between these lower threshold levels helps teachers to support learners that possess varying vocabulary knowledge. Considering the 98% lexical coverage, the same studies showed a broader variation in reaching this higher threshold. Yang and Coxhead (2020) found a lower threshold of 5000–6000 word families was needed for the New Concept English titles. Sun and Dang’s (2020) study on the Yilin series and Hajiyeva’s (2015) university textbook showed that knowledge of the first 9000 word families is needed to achieve 98% comprehension. More surprisingly, Matsuoka and Hirsh (2010) found that the New Headway (Upper Intermediate) textbook content never reached 98% lexical coverage. These results indicate that learners need a broad range of vocabulary knowledge when using commercial textbooks irrespective of the assigned proficiency level, suggesting that further research is needed to understand the different lexical demands of individual textbooks required by learners.

2.2. Vocabulary Frequency in Textbooks

Another useful approach is understanding the vocabulary frequency included in commercial textbooks. Schmitt and Schmitt (2014) divided vocabulary frequency into three categories, high frequency, mid-frequency, and low frequency, which can be defined as follows (Schmitt and Schmitt 2014):
  • High frequency: First, second, and third 1000 word families (e.g., go, buy, and watch).
  • Mid-frequency: Fourth to eighth 1000 word families (e.g., academic, frequent, and octopus).
  • Low frequency: Ninth group upwards (e.g., outlandish, florescent, and azalea).
As defined above, the vocabulary frequency bands divide Nation’s (2012) BNC/COCA 25,000 into 3 categories that can help better understand learners’ required vocabulary knowledge. From these 3 categories, Macalister and Nation (2020) commented on the importance of choosing texts that have high-frequency and wide-ranging input for learners. Specifically, Yang and Coxhead (2020) also described how these content and function words are usually the majority of high-frequency words included in English texts and, consequently, will be the words that learners identify the most. Although various studies have investigated high-frequency words in texts (see Table 1), Schmitt and Schmitt (2014) suggested that further research is needed to understand the mid-frequency and low-frequency word requirements (i.e., by creating word lists) to facilitate supporting learners’ comprehension. The importance of providing opportunities for learners to acquire more difficult and complex mid-frequency and low-frequency words was also noted by Mo and Bi (2024). For example, knowledge of mid-frequency words can often support learners in their reading fluency and, while knowledge of low-frequency words is rarely needed, as these items are infrequently used in the text, identifying such items can determine the potential vocabulary difficulties that may exist for learners (Yang and Coxhead 2020). Sun and Dang (2020) advocated how categorizing Nation’s (2012) BNC/COCA lists into the 3 high-/mid-/low-frequency categories provides a more representative analysis of the lexical difficulties that learners may encounter. Learners need to be exposed to more repeated encounters with complex and difficult words that occur in the mid-frequency and low-frequency categories for learners to acquire knowledge of these words (Mo and Bi 2024). In turn, identifying these word frequency categories can help teachers to understand the most appropriate words to teach learners. However, categorizing words into high-/mid-/low-frequency lists can often be time-consuming for teachers, as the terms are rarely presented using these divisions in textbooks.
Further, studies investigating lexical profiles by word frequency have seldom investigated solely listening input in commercial textbooks. For example, O’Loughlin’s (2012) study investigated the listening texts in the New English File textbook series. He analyzed 29,716 running words from the series’ Elementary, Pre-intermediate, and Intermediate textbooks. Although he found that the vocabulary levels increased in difficulty for each level, he also found a variance in the lexical demands of the listening resources included in each textbook. Specifically, O’Loughlin (2012) found that out of a total of 1045 word families used across all 3 levels of the textbook, only 860 of these word families were used in the Intermediate textbook. These results indicate that as the 185 word families included in the first 2 textbook levels (i.e., Elementary and Pre-intermediate) were not included in the Intermediate textbook, learners may find these items difficult to master. This finding indicates the importance of repeating lexical terms throughout different levels of a textbook series. Specifically, learners need to master appropriate levels of vocabulary (i.e., low/mid/high) before studying the next level of the textbook. However, as this is not always possible, repetition of previous lexical items that feature in a textbook series is needed to enable learners to master words from previous textbooks when studying with higher levels of the series. In this example, O’Loughlin (2012) explained how learners may have lexical difficulties when using higher levels of the English File textbook if they do not have the opportunity to master these word families in the first two levels. As Mo and Bi (2024) further explained, more sophisticated words were not included beyond the first 1500 most frequent words, suggesting that words from the higher second and third lists were underrepresented or lacked repeated exposure. Therefore, these findings indicate the importance of repeating lexical items in textbooks and using relevant resources to support learners’ vocabulary knowledge, in line with their textbook and learning progression. If words from previous textbooks are not included in later titles, teachers may have to rely on the assumption that learners have achieved a certain vocabulary threshold, resulting in problematic vocabulary for them if they meet unfamiliar lexis in higher textbook levels. Thus, in this study, examining the repetition of the lexical demands for each TED Talk, as used in a textbook series, can help educators to identify the lexical knowledge needed by learners for each sequential unit.

2.3. TED Talks and Vocabulary Learning

Vocabulary learning can be categorized into an academic word list (AWL), as devised by Coxhead (2000), to understand the occurrence of certain words in academic texts. Specifically, Coxhead’s (2000) word list consists of 25 lists, each comprised of 1000 words (or a total of 25,000 words), drawing from 570 word families to present a comprehensive list, totaling 10% of the academic words used in academic texts. As learners progress through each subsequent list, the words become less frequently used in academic texts. Compared to Nation’s (2012) BNC/COCA 25,000 list, AWL items tend to appear more often in academic rather than non-academic texts, providing a useful indicator of possible vocabulary difficulties for learners in academic contexts.
In academic learning, TED Talks are used as a listening resource in many classrooms (see Liu and Chen 2019; Mojgan and Tollabi 2019). TED Talks emerged as a global non-profit platform to showcase talks on more than 300 research topics listed under 6 categories: Business, Design, Education, Global Issues, Science, and Technology. Since 2007, TED Talks can be easily accessed for free from their website (www.ted.com accessed on 10 September 2024) for viewers to choose from both native and non-native expert presenters speaking about topics in more than 40 different languages. In English, both first-language and second-language experts speak in a range of Global Englishes, with talks accompanied by resources, such as transcripts and subtitles that can support L2 instruction. Romanelli et al. (2014) described the suitability of TED Talks in learning as shorter, unrestricted video accounts that help deliver lecture-type topics to learners. Further, TED Talks are often described as high-quality, professionally presented, and culturally relevant presentations that help learners digest a variety of topics. However, Field (2008) noted the generality of authentic resources that are “designed without language learning in mind” (p. 274). Therefore, understanding the vocabulary demands of authentic resources, such as TED Talks, could help determine their suitability for learners. To help ascertain the optimum lexical demands of TED Talks for language learning, previous studies have examined the vocabulary load of selected talks, as shown in Table 2.
As Table 2 shows, a variance in vocabulary knowledge is required by learners to understand these TED Talks. For example, all three studies showed that learners need vocabulary knowledge of between the first 3000 and 5000 word families to achieve 95% understanding. The higher threshold of 98% understanding requires learners to have vocabulary knowledge of between 5000 and 10,000 word families. Coxhead and Walls’ (2012) smaller corpus of 60 TED Talks and Nurmukhamedov’s (2017) larger corpus of 400 TED Talks both found that learners needed 4000–5000 word families for 95% coverage and 9000 word families for 98% coverage. Although Liu and Chen’s (2019) larger corpus of 2089 TED Talks resulted in only 3000 word families for 95% coverage and 5000–7000 word families needed for 98% coverage, the results from all the studies concurred that learners need a large variance of between 5000 and 10,000 words to understand a range of TED Talks consisting of different topics and themes. These thresholds and potential lexical demands highlight the need to understand the individual vocabulary knowledge of the TED Talks used in commercial textbooks to prepare learners for their listening input. Although these previous studies indicate that vocabulary knowledge of between 3000 and 5000 word families may be optimum for Intermediate learners, further research is needed to understand the vocabulary demands, potential language difficulties, and necessary scaffolding for learners when listening to TED Talks included in Pre-intermediate commercial textbooks.

2.4. Academic Vocabulary in TED Talks

Understanding the academic vocabulary content in listening texts is another approach to measuring potential lexical difficulty for learners. For learners intending to study in academia, general academic vocabulary is considered important to learn, as it makes up approximately 9% of academic written texts (Nation 2022). Similarly, Warnby (2023) noted how academic vocabulary consists of about 10–15% of most written academic texts used for reading, suggesting that academic vocabulary knowledge is important for learners in university studies. As Warnby (2023) explained, dependent on the learners’ vocabulary level, learners with lower vocabulary knowledge may find these reading texts, and their consequent lexical burden, more problematic. However, understanding academic vocabulary in listening texts is another approach to measure potential lexical difficulty for learners. Researchers (Coxhead and Walls 2012; Nurmukhamedov 2017; Wingrove 2022) have turned to investigating the academic vocabulary in TED Talks to help support learners in understanding the resource better in their English for Academic Purposes (EAP) courses. To date, four studies have examined academic vocabulary coverage using the AWL to analyze the vocabulary in TED Talks, which are shown in Table 3.
As Table 3 shows, the four studies showed some agreement between academic vocabulary coverage found in TED Talks. In Nurmukhamedov and Sadler’s (2011) analysis of one TED Talk, a higher academic coverage of 5% was found in Ken Robinson’s “Schools Kill Creativity” talk. However, a more thorough analysis of larger TED Talks corpuses conducted in the three other studies (Coxhead and Walls 2012; Nurmukhamedov 2017; Wingrove 2022) showed similar findings of around 4% academic coverage. In line with Coxhead and Walls’ (2012) initial findings of 3.90% in their analysis of 60 TED Talks, the results from these studies suggest that around 4% of the vocabulary in the TED Talk is academic. It is worth noting that different academic word lists were used in the four studies, three used the AWL (Coxhead 2000), while one study (Wingrove 2022) also used the academic vocabulary list (AVL) (Gardner and Davies 2014). While two lists were used, the coverage of academic vocabulary, at 4%, was similar. Wingrove’s (2022) study highlighted discrepancies in coverage between academic word lists. When comparing the results of lexical analyses, the medium coverage varied from 3% to 4% (when using the AWL) to 17–18% (when using the AVL total list). To date, no study has used the updated NAWL in a lexical analysis of TED Talks. In listening, this observation highlights the importance of identifying the academic items within a TED Talk included in textbooks that could help support learners in their listening comprehension when completing the unit. This study aims to identify the academic coverage found in the specific Pre-intermediate TED Talks textbook using the NAWL, to provide learners with the appropriate lexical support for each unit.

2.5. Multiword Units in Textbooks and TED Talks

Multiword units (MWUs) is another crucial approach to understanding potential lexical demands in language learning (Ackermann and Chen 2013; Biber 2009). Defined as “phrases that are made up of words that frequently occur together” (Nation et al. 2016, p. 71), MWU is an umbrella term that incorporates various phrase types, including collocations (Siyanova and Schmitt 2008), idioms (Siyanova-Chanturia and Martinez 2015), lexical bundles (Byrd and Coxhead 2010), formulas (Simpson-Vlach and Ellis 2010), word clusters (McCarthy and Carter 2006), and formulaic sequences (Wood 2002). MWU sequences can be problematic for learners who may interpret each word individually rather than as a chunk of language. Wang and Liu (2023) suggested that increased encounters with both single words and collocations can result in higher gains of vocabulary knowledge for learners. However, in EAP written discourse, several studies have examined the coverage of MWUs (i.e., Ackermann and Chen 2013; Byrd and Coxhead 2010; Wood and Appel 2014), concluding that MWUs rarely occur in written texts and are not explicitly taught. In spoken discourse, Foster (2001) identified that speech is highly formulaic in general spoken discourse, while Erman and Warren (2000) noted that approximately 58.6% of lexicon consists of formulaic sequences. Few studies have investigated spoken MWUs (i.e., Coxhead et al. 2017), although Simpson-Vlach and Ellis’ (2010) earlier study developed a spoken academic formulas list (AFL) containing 607 items based on a 2.1 million word spoken corpus of academic speech. However, to our knowledge, no study has investigated the coverage of MWUs in specific TED Talks. Therefore, this study aims to understand the inclusion of both general and academic MWUs of the selected TED Talks and their coverage in the textbook.

3. Materials and Methods

3.1. Research Questions

This study investigated the lexicon of TED Talks utilized in the commercial textbook, Keynote 2. The results will indicate whether the TED Talks included within a given textbook form a suitable resource for learners. The following research questions guided the study:
  • What are the vocabulary loads of the TED Talks used in Keynote 2?
    1a. How do the vocabulary loads of the TED Talks and Keynote 2 compare?
  • What is the lexical coverage of the academic words in the TED Talks used in Keynote 2?
    2a. How does the lexical coverage of the academic words in the TED Talks and Keynote 2 compare?
  • What are the multiword units (MWUs) that occur in the TED Talks used in Keynote 2?
    3a. How do the multiword units (MWUs) that coverage in the TED Talks and Keynote 2 compare?

3.2. TED Talks in Keynote 2

Keynote 2 (Bohlke et al. 2016) was first published by Cengage in 2015. The commercial textbook was chosen by the Introductory English course instructors at a public university in Japan. The textbook consists of 12 units that have a chosen TED Talk for each listening practice, which are described as suitable for Pre-intermediate learners (CEFR A2/B2). The 12 units were used to teach first-year university students over a 14-week semester. Each unit’s TED Talk is supported by relevant content-based activities, which focus on each of the four skills (i.e., listening, reading, speaking, and writing). Each unit is divided into five lessons (i.e., Lessons A–E): Lessons A–C prepare learners about the TED Talk topic and content, Lesson D helps learners as they listen to the TED Talk, and Lesson E helps learners apply these ideas to real-world contexts. Additionally, each unit provides learners with topic-related vocabulary, visual aids, and content-based materials to help them to complete the textbook activities. Although Keynote 2 is only one textbook from a series of six, the aim of this study is to understand if the vocabulary in both the TED Talks and textbook correlate suitably for Pre-intermediate learners. As textbooks tend to be an integral tool for the teacher, the vocabulary content of the textbook is important to provide adequate guidance for learners of a particular level in the classroom. Wang and Liu (2023) stated how materials are integral in prescribing the vocabulary knowledge to be taught for the teacher and the sequence and manner of the words to be presented to learners. Extensively analyzing one textbook to understand the vocabulary content of both the listening resources and the textbook content itself can help identify if authentic resources can be integrated into textbook materials for second language learners. Completing a thorough analysis of one textbook in the series can also help to indicate if the authentic resources and language content are level appropriate or if further analysis of the complete series is needed.

3.3. Data Collection

Keynote 2 was purchased and prepared for the data analysis. First, the TED Talk from each unit was tabulated and recorded by speaker, title, running time, and number of tokens (i.e., words). Each TED Talk is shown in Table 4.
As the transcripts were not available in the students’ textbook, the TED Talk transcripts were obtained from the official TED Talk website (www.ted.com accessed 10 September 2024) (TED Talks 2024). Each transcript was then cleaned using the following guidelines from Nation et al. (2016):
  • Correct/check language to American English (memorise = memorize)
  • Check spellings of unrecognized words
  • Remove hyphens
  • Remove apostrophes
  • Delete spaces
  • Rewrite contractions (haven’t = have not)
The transcripts were first saved and checked as a Microsoft Word document (.docx file) and then saved and checked as a Notepad document (.txt file). As Nation et al. (2016) suggested, checking the data at least twice helps researchers to notice any missed spaces, apostrophes, or hyphens. The additional check also provides a further opportunity to add words that were not found in the BNC/COCA lists (Nation et al. 2016), such as proper nouns (e.g., Sakura and Woodford), compound words (e.g., basecamp and firewood), or newly coined terms that belong to a word family. If the items satisfy the criterion of levels three to six of Bauer and Nation’s (1993) word family scale (i.e., most frequent regular derivational affixes, frequent and orthographically regular affixes, regular but infrequent affixes, and frequent but irregular affixes), the item can be added to the list. For example, the word selfie was added to BASEWRD1, as the meaning is related to the word family self. Nation et al. (2016) commented on the importance of these checks to continuously update the BNC/COCA word list to ensure recency, validity, and reliability of lexical terms that are used in texts. To compare the vocabulary load in the TED Talks to each unit, the textbook was first scanned using Optical Character Recognition (OCR) software and cleaned following the guidelines above. Finally, 12 data files were created for each unit in the textbook.

3.4. Data Analysis

Each TED Talk transcript and unit was analyzed using Heatley et al.’s (2002) Range program and Nation’s (2012) BNC/COCA 25,000 lists. The BNC/COCA word list is compiled of 25, 1000-word families from the British National Corpus (Ashton and Burnard 1998) and the Corpus of Contemporary American English (Davies 2008) and is ranked according to frequency and range. Additionally, Nation’s (2012) four supplementary lists (i.e., proper nouns, marginal words, transparent compounds, and abbreviations) were also used in the analysis.

3.5. Corpus Analysis

AntWordProfiler (Version 2.2.1) (Anthony 2023), along with the new academic word list (NAWL) (Browne et al. 2013), were used to identify academic items in the TED Talks and textbook. The NAWL is a 957-word list of general academic items and was designed as an updated version to the original academic word list (AWL) by Coxhead (2000). While a spoken academic word list would be more applicable to identify items, currently, there is no lexical profiler that uses the academic spoken word list (Dang et al. 2017). Antwordprofiler (Anthony 2023) allows the user to compare texts and highlight items (in this case, the NAWL) that occur in each.
The multiword unit profiler (MWUP) (Version 2.0.0) (Eguchi 2021) was used to identify general and academic MWUs in the TED Talks. The phrasal expressions list (Martinez and Schmitt 2012) contains 505 of the most frequent non-transparent expressions in English. The academic formulas list (AFL) (Simpson-Vlach and Ellis 2010) contains 607 written and spoken MWUs from general academic contexts. When run through MWUP, it displayed frequency data for the identified items in each word list. Once analyzed, the relative frequency of each MWU was calculated using its frequency, divided by the number of running words, and multiplied by one million (i.e., 8 ÷ 18,120 × 1,000,000). For example, part of the occurred eight times and was divided by the TED Talks corpus (18,120 running words), then multiplied by one million to equal its relative frequency (=442). In the textbook, the relative frequency of part of the was 286 (9 ÷ 31,462 × 1,000,000). As presented in the following Results Section, the vocabulary load analysis and the corpus analysis on both the TED Talks and the textbook units will indicate the lexical demands required by learners when using Keynote 2.

4. Results

This study examined the lexicon of the 12 chosen TED Talks and the textbook content in Keynote 2 using four single and MWU lists. Nation’s (2012) BNC/COCA 25,000-word list was used to understand the vocabulary load and cumulative coverage of each TED Talk and textbook unit. Browne et al.’s (2013) NAWL identified 150 academic words in the TED Talks. Martinez and Schmitt’s (2012) phrasal expressions list identified 141 general phrasal expressions, and Simpson-Vlach and Ellis’ (2010) academic formulas list identified 97 academic MWUs. The following section shows the results for each of the three research questions.

4.1. Vocabulary Load of TED Talks in Keynote 2

Table 5 presents the cumulative coverage of Nation’s (2012) BNC/COCA word lists for each TED Talk used in Keynote 2. As the supplementary lists (SL = proper nouns, marginal words, and compound words) are less of a learning burden compared to other words (Nation and Webb 2011), their coverage is shown at the top of the table for reference and then added accordingly. As illustrated in blue, 10 of the 12 units (Units 2, 3, 4, 5, 6, 7, 8, 9, 10, and 11) achieved 95% coverage between the 2000 and 3000 word families. As illustrated in red, the same units (except Units 4 and 11) also reached 98% between the 3000 and 5000 word families, suggesting that learners would need to know the first 5000 word families to understand a majority of the textbook’s TED Talks. However, Units 1, 11, and 12 showed varying difficulty. Unit 12 results indicated that 95% coverage is reached at 4000 words and 98% coverage is achieved at 6000 words. Two units also indicated that a higher lexical range is needed, with 95% coverage achieved at 6000 (Unit 1) and 3000 (Unit 11) words and 98% reached at 8000 words for both units’ talks, respectively. These results suggested vocabulary difficulty is likely, especially with the Unit 1 and Unit 11 TED Talks, as a broader vocabulary knowledge at the mid-level and low-level frequency is needed to understand the content.
Table 6 compares the cumulative coverage of the combined 12 TED Talks scripts with the coverage of the other unit content in Keynote 2. The results showed that the TED Talks were slightly more lexically demanding than the textbook. A vocabulary size of 5000 word families is needed for 98% coverage of the TED Talks, compared to 4000 word families for the textbook. Although further analysis is needed to identify the specific vocabulary types to compare the lexical content in both the Keynote 2 unit and the TED Talks, the results showed that the textbook did not correlate the vocabulary knowledge with the lexical demands indicated in the TED Talks.

4.2. Lexical Coverage of Academic Words in TED Talks Used in Keynote 2

Table 7 shows the frequency and coverage of the 150 academic words identified in the TED Talks from the NAWL (Browne et al. 2013). Additionally, Table 7 shows the items’ frequency and their coverage in each unit of the textbook. The results highlighted three key points. First, coverage of academic words in the TED Talks was low, with an average of only 1.24% from the 12 talks. Second, only 9 of the 150 items occurred in different TED Talks (i.e., amongst, prey, sub, publish, essentially, reconstruct, afterwards, goods, and incredible). This suggests that there was very little repetition of these items in the chosen audio forms. Finally, the academic words did not occur in the corresponding unit in the textbook. Only ancestor (7 times), lab (7 times), and fossil (10 times) occurred more than three times in the corresponding unit. The other items (see Table 7) seldom occurred in the unit. Therefore, this indicates that the textbook does not provide frequent opportunities for learners to encounter the academic words that they will hear in the TED Talks. Thus, the results indicate a lack of consistency in the academic vocabulary coverage found between the TED Talks and the textbook.

4.3. Multiword Units Coverage of the TED Talks in Keynote 2

Table 8 presents the relative frequency per million of the 141 phrasal expressions and 97 academic formulas identified in the TED Talks and their relative frequency in the textbook. Upon closer analysis, eight phrasal expressions (e.g., have to, a lot, and a few) identified in the TED Talks had a relative frequency of over 501. When compared to the textbook, only 1 expression (a lot) had a relative frequency of over 501. This analysis indicates that coverage of the same MWUs in both the textbook and TED Talks could correlate more closely when designing activities in the curriculum design process. Similar to academic single words (see Table 7), Table 8 shows that there were fewer academic items (97) than general MWUs (141) included in the textbook. Therefore, the relative frequency of general and academic MWUs found in TED Talks was larger than those found in the textbook.

5. Discussion

The analysis showed that the TED Talks and the textbook provided inadequate practice for the other when developing vocabulary knowledge: the TED Talks are lexically more demanding than the textbook. In line with previous research, the TED Talks selected for this textbook had cumulative thresholds of 3000 word families for 95% and 5000 word families for 98% coverage (see Elk 2014; Liu and Chen 2019). However, there were several inconsistencies, with Units 1, 11, and 12 reaching the 95% threshold between 3000 and 6000 word families, and reaching the higher 98% threshold between 5000 and 8000 word families, which are similar to other studies (Coxhead and Walls 2012; Nurmukhamedov 2017). In contrast to Field’s (2008) suggestions of language difficulty progressing systematically, these findings suggest that the TED Talks chosen for this textbook may not grow in difficulty sequentially. As the results illustrated, earlier units (e.g., Unit 1) are more difficult to understand than later units (i.e., Unit 4). The vocabulary level of TED Talks in textbooks should be monitored more closely to ensure that the level of difficulty develops in line with the learners’ expected language progress. As the topic of each unit is based on the TED Talks, it would seem that the talks are not a standalone resource and, therefore, the textbook should be preparing the learner to understand the talk by introducing critical vocabulary. However, in the current state, teachers would have to identify the vocabulary load of both the textbook and TED Talk, and prime learning of specific vocabulary items to bridge the gap between learners’ vocabulary knowledge and the 8000 word family threshold needed to achieve 98% comprehension.
While studies have noted TED Talks can be appropriate learning resources (Liu and Chen 2019; Mojgan and Tollabi 2019; Romanelli et al. 2014), it is important to note that the resource was not designed for academia. As Field (2008) describes, authentic resources, such as TED Talks, were not designed primarily for language learning activities. These observations point to the limitation of using TED Talks in learning, as the lack of control over the vocabulary may not be suitable for second language learners. Thus, the results of this study indicate that more attention should be paid to analyzing the specific language needed by learners in TED Talk resources to use in correlation with textbook activities. TED Talks used in textbooks, such as Keynote 2, could better prepare learners by understanding the language used by the speaker, categorizing the items by frequency or word type (i.e., academic and multiword units), and focusing on repetition of these terms throughout the textbook activities. For example, introducing and repeating vocabulary that is key to the video could expose learners to the items they need to comprehend the TED Talk. Therefore, repeating this study’s analysis of Keynote 2 with the other textbooks in the series could further ascertain the lexical needs that learners may need to pay more attention to in lessons when using authentic resources.
The corpus analysis also revealed two results. First, the TED Talks analysis contained few instances of academic vocabulary, with only 1.24% coverage from the NAWL (Browne et al. 2013). Further, in the textbook content itself, only 0.99% academic coverage of the items from the TED Talks was found. In both instances, the coverage was below previous findings of 4%, which could be a result of the analysis using the NAWL instead of the AWL (Coxhead 2000) and AVL (Gardner and Davies 2014) (see Table 3). In this study, the researchers deemed the NAWL as the most appropriate analytical tool because it is a larger and updated corpus of 288 million tokens (Browne et al. 2013), compared to the AWL corpus of 3.5 million tokens (Coxhead 2000). However, further research could consider whether 4% coverage (as suggested by studies analyzing academic vocabulary using the AWL) is appropriate or if an alternative NAWL threshold is needed. As the findings in this study indicated a lower NAWL coverage of 1.24%, the AWL coverage threshold of 4% may not be applicable in this instance. Further analyses using the NAWL (Browne et al. 2013) could identify the most appropriate coverage threshold for academic vocabulary in TED Talks.
Warnby (2023) also noted how the inclusion of academic vocabulary may correlate positively with general vocabulary to provide more exposure for learners to encounter these words. However, learners listening to the TED Talks in this textbook may find the listening selections are not providing adequate coverage of academic vocabulary in the corresponding unit content. These findings indicate that the selected TED Talks may not provide learners with the suitable learning opportunities that they need in the textbook to acquire the academic vocabulary included as their language learning progresses. Thus, learners using this textbook (and the chosen TED Talks) may not develop the necessary academic vocabulary knowledge needed for their studies. This may lead to comprehension difficulties, fixation, or loss of focus, as the learners are unable to process the input they hear in real time (Goh 2000). These results suggest that learners need more exposure to these—and other—academic items in TED Talks; otherwise, they may not be sufficiently prepared to develop their academic vocabulary knowledge as they progress through the textbook.
Second, while general and academic MWUs are evident in the TED Talks, their relative frequency in each unit is not sufficiently covered, recycled, or repeated sufficiently throughout the textbook. Similar to Wang and Liu’s (2023) findings, the lack of exposure of MWUs means that these words may remain unlearned, as their infrequent repetition does not allow for the learners to develop their knowledge of these terms. As O’Loughlin (2012) found in his study, the lack of repetition opportunities may cause vocabulary knowledge difficulties for learners. For example, if learners have not mastered the words from previous units or textbook levels, it may be incorrectly assumed that they have acquired adequate vocabulary knowledge. Wang and Liu (2023) emphasized the important correlation between repeated encounters of collocations and vocabulary knowledge for better vocabulary gains. These results suggest that although coverage of general vocabulary may be sufficient, academic vocabulary and MWUs are not as frequently repeated when developing textbook materials. Unit activities could feature correlating language to link the textbook to the chosen TED Talk. As Keynote 2 is considered a textbook to be used for academic purposes, the need to include both academic and MWU items within the textbook activities is more pertinent, as learners may encounter lexical difficulties if they do not repeatedly meet these items before listening. With over 3500 TED Talks (Ted.com accessed 10 September 2024) currently available, more attention to choosing TED Talks and facilitating textbook activities that include a minimum of 4% academic vocabulary should be prioritized so learners can recycle and acquire these words. From these results, there is a strong suggestion that the textbook activities could include more suitable content to expose learners to repeated vocabulary to support the vocabulary knowledge needed for the chosen TED Talks. As the next section discusses, several possible solutions could provide learners with the necessary exposure to academic vocabulary and MWUs when using Keynote 2.

6. Pedagogical Implications

There are three main implications for simplifying the lexical demands of both the TED Talks and the textbook from this study. First, concerning the repetition of vocabulary, academic or multiword-unit-specific word lists could be used to prime learners before listening or help scaffold their learning while listening with Keynote 2. As the results showed, although the chosen TED Talks included specific word types, the textbook did not use these in activities. Therefore, specific word lists could help facilitate learning by allowing learners to encounter these academic words and MWUs more frequently before they listen so they can focus on their comprehension while listening. Hloba (2016) suggested that learners can use transcripts to identify the words before completing information transfer activities (i.e., gap fills) to prepare learners to focus on form in productive activities (i.e., spoken or written post-listening tasks). Using word lists can help facilitate explicit vocabulary instruction to help learners pay attention to new or unknown items throughout the listening process (Goh 2000).
Next, supplementary materials (such as visual prompts) could help support learners in developing their academic vocabulary knowledge when using Keynote 2. As Table 8 shows, the relative frequency of MWUs appeared more in the chosen TED Talks than in the textbook content. One approach to bridge this gap is to introduce learners to vocabulary strategies (i.e., predicting and inferencing) to help develop their vocabulary knowledge of MWUs. For example, learners could be provided with visual prompts or matching activities to help introduce some of the low-frequency units that are included in the TED Talks. Using materials that help learners to predict or learn the MWUs allows learners to engage in real-time comprehension. Learners can then address their lexical difficulties by predicting meaning, inferencing potential topic and vocabulary knowledge gaps, and connecting images to interpretations of the TED Talks. Simultaneously, as Field (2008) noted, teachers can provide learners with more exposure to real-world topics and real-time listening practices using authentic resources that scaffold learners’ vocabulary knowledge for more difficult TED Talks in Keynote 2.
Finally, teachers can consider using other textbook activities and alternative TED Talks for more adequate practice within their teaching contexts. As this study’s analysis of Keynote 2 has shown, the vocabulary demands for each unit and the specific listening text can differ from the suggested vocabulary exposure needed in the textbook. More attention in choosing units and resources that are reflective of the support needed to achieve the specified vocabulary knowledge level could provide more effective listening practices when using the textbook or alternative resources. As previous studies showed, there is a discrepancy between the textbook’s vocabulary demands of 2000–4000 words and the TED Talks’ vocabulary demands of 3000–5000 words (see Table 1 and Table 2). Mo and Bi (2024) suggested two principles to ensure more sufficient exposure to vocabulary in textbooks. First, words found in all the frequency lists should be representative of the vocabulary found in both the TED Talks and the textbook. Second, more sophisticated mid-frequency and low-frequency vocabulary from the TED Talks should be included in textbooks to enhance the vocabulary scope of textbook materials and allow for learners to learn words that they may or may not have previously learned. Therefore, more attention in choosing textbooks or resources that carefully align with these vocabulary demands could provide learners with more ample scaffolding, repetition, and exposure to vocabulary that helps learners to maximize their vocabulary learning in listening practices.

7. Limitations and Future Research

This study has two main limitations. First, the current analysis of TED Talks used in one commercial textbook cannot represent the entirety of this or any other textbook series. Currently, Keynote has six textbooks in the series, ranging between elementary to advanced proficiency levels, which use TED Talks as their primary listening input. Furthermore, Cengage Learning has developed another textbook series (i.e., World English) that also uses TED Talks. Further lexical analysis and comparison of these two titles could help establish if there is any correlation between the lexical demands of the chosen TED Talks and the textbook materials. For future research, a larger study to include the TED Talks used in all the Keynote (i.e., Keynote 1, 3, 4, Advanced, and Professional) and World English series could focus on identifying if the lexical choices are appropriate and contain suitable coverage of academic vocabulary. This analysis could also help to validate if the TED Talks and textbook findings from Keynote 2 in this study are reiterated in the whole Keynote series and other textbooks.
Second, although this study focused on academic vocabulary and MWUs, further research is needed to examine specific word types in more depth. Using the NAWL (Browne et al. 2013) could further validate or create new thresholds considering the established 4% suggested coverage from the AWL (Coxhead 2000). Further, using tools such as the academic spoken word list (ASWL) (Dang et al. 2017) could provide a more thorough corpus analysis of spoken texts to help establish specific spoken word lists for each textbook. The findings from such studies can then assist teachers and learners in comprehending TED Talks used in commercial textbooks. In turn, a pedagogically orientated study could be conducted to investigate if these TED-Talk-specific word lists help learners to successfully develop their vocabulary knowledge. Measuring learners’ vocabulary knowledge by analyzing their written notes and summaries for MWUs and academic vocabulary content could also provide valuable insights into the use of word lists and their potential to address learners’ individual vocabulary difficulties.

8. Conclusions

This paper presented an in-depth lexical analysis of 12 TED Talks and the unit content used in the commercial textbook, Keynote 2. The results showed that the TED Talk selections and the textbook provided learners with limited repetition activities, limited task practices, and limited exposure to academic vocabulary. Specifically, three units included a higher percentage of mid-frequency and low-frequency words, which were lexically more difficult than the textbook’s prescribed level of Pre-intermediate learners. Further, the NAWL and academic vocabulary analysis showed that the TED Talks and the textbook did not include recycling or repetitive opportunities. Similarly, the MWUP analysis showed that the TED Talks’ MWU items were not highly prioritized or recycled in textbook activities. Therefore, teachers and learners need to identify the potential difficulty of these word types in listening texts to prime learners sufficiently for the listening text. Creating supplementary academic and MWU-specific word lists can help prepare learners to identify these terms before they watch the TED Talks. With further development of textbook materials, lexically appropriate listening tasks can be tailored to supplement learners using these textbooks to provide them with the scaffolding needed to achieve their language learning goals.

Author Contributions

Conceptualization: N.M.-R. and S.B.; Methodology: N.M.-R. and S.B.; Software: S.B.; Validation: N.M.-R. and S.B.; Formal analysis: N.M.-R. and S.B.; Investigation: N.M.-R. and S.B.; Resources: N.M.-R.; Data curation: N.M.-R. and S.B.; Writing—original draft preparation: N.M.-R. and S.B.; Writing—review and editing: N.M.-R. and S.B.; Visualization: N.M.-R. and S.B.; Supervision: N.M.-R.; Project administration: N.M.-R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author/s.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Ackermann, Kirsten, and Yu-Hua Chen. 2013. Developing the academic collocation list (ACL)—A corpus-driven and expert-judged approach. Journal of English for Academic Purposes 12: 235–47. [Google Scholar] [CrossRef]
  2. Anthony, Laurence. 2023. AntWordProfiler (Version 2.2.1) [Computer Software]. Available online: https://www.laurenceanthony.net/software/antwordprofiler/ (accessed on 10 September 2024).
  3. Ashton, Guy, and Lou Burnard. 1998. The BNC Handbook. Edinburgh: Edinburgh University Press. [Google Scholar]
  4. Bauer, Laurie, and Ian Stephen Paul Nation. 1993. Word families. International Journal of Lexicography 6: 253–79. [Google Scholar] [CrossRef]
  5. Benson, Stuart, and Naheen Madarbakus-Ring. 2021. A Comparison of Textbook Vocabulary Load Analysis. Vocabulary Learning and Instruction. Available online: https://vli-journal.org/wp/vli-issue-102/ (accessed on 10 September 2024).
  6. Biber, Douglas. 2009. A corpus-driven approach to formulaic language in English: Multi-word patterns in speech and writing. International Journal of Corpus Linguistics 14: 275–311. [Google Scholar] [CrossRef]
  7. Bohlke, David, Paul Dummett, Lewis Lansford, and Helen Stephenson. 2016. Keynote 2. Boston: Cengage Heinle. [Google Scholar]
  8. Browne, Charles, Brent Culligan, and Joseph Philips. 2013. The New General Service List Website. Available online: https://www.newgeneralservicelist.com/ (accessed on 10 September 2024).
  9. Byrd, Pat, and Averil Coxhead. 2010. On the other hand: Lexical bundles in academic writing and in the teaching of EAP. University of Sydney Papers in TESOL 5: 31–64. [Google Scholar]
  10. Coxhead, Averil. 2000. A new academic word list. TESOL Quarterly 34: 213–38. [Google Scholar] [CrossRef]
  11. Coxhead, Averil, and Roz Walls. 2012. TED Talks, Vocabulary and Listening for EAP. TESOL ANZ 20: 55–68. Available online: https://www.tesolanz.org.nz/wp-content/uploads/2019/12/TESOLANZ_Journal_Vol20_2012.pdf#page=66 (accessed on 10 September 2024).
  12. Coxhead, Averil, Thi Ngoc Yen Dang, and Shota Mukai. 2017. Single and multi-word unit vocabulary in university tutorials and laboratories: Evidence from corpora and textbooks. Journal of English for Academic Purposes 30: 66–78. [Google Scholar] [CrossRef]
  13. Dang, Thi Ngoc Yen, Averil Coxhead, and Stuart Webb. 2017. The academic spoken word list. Language Learning 67: 959–97. [Google Scholar] [CrossRef]
  14. Davies, Mark. 2008. The Corpus of Contemporary American English: 520 Million Words, 1990–Present. Available online: http://corpus.byu.edu/coca/ (accessed on 10 September 2024).
  15. Eguchi, Masaki. 2021. Multi-Word Units Profiler. (Version 2.0.0) [Computer Software]. Available online: https://multiwordunitsprofiler.pythonanywhere.com (accessed on 10 September 2024).
  16. Elk, Carolyn K. 2014. Beyond Mere Listening Comprehension: Using TED Talks and Metacognitive Activities to Encourage Awareness of Errors. International Journal of Innovation in English Language Teaching and Research 3: 215–47. Available online: https://search.proquest.com/docview/1655287119?accountid=14782 (accessed on 10 September 2024).
  17. Erman, Britt, and Beatrice Warren. 2000. The idiom principle and the open-choice principle. Text 20: 29–62. [Google Scholar] [CrossRef]
  18. Field, John. 2008. Listening in the Language Classroom. New York: Cambridge University Press. [Google Scholar]
  19. Foster, Pauline. 2001. Rules and routines: A consideration of their role in the task-based language production of native and non-native speakers. In Researching Pedagogic Tasks: Second Language Learning, Teaching, and Testing. Edited by Martin Bygate, Peter Skehan and Merill Swain. Harlow: Longman, pp. 75–93. [Google Scholar]
  20. Gardner, Dee, and Mark Davies. 2014. A new academic vocabulary list. Applied linguistics 35: 305–27. [Google Scholar] [CrossRef]
  21. Goh, Christine. 2000. A cognitive perspective on language learners’ listening comprehension problems. System 28: 55–75. [Google Scholar] [CrossRef]
  22. Hajiyeva, Konul. 2015. A corpus-based lexical analysis of subject-specific university textbooks for English majors. Ampersand 2: 136–44. [Google Scholar] [CrossRef]
  23. Heatley, Alex, Ian Stephen Paul Nation, and Averil Coxhead. 2002. Range Computer Programme. Available online: https://www.victoria.ac.nz/lals/about/staff/paul-nation (accessed on 10 September 2024).
  24. Hloba, Olena. 2016. TED Speeches as a Tool to Improve Listening Skills of Students Majoring in Psychology. Rome: Edizioni Magi, vol. 4, pp. 1–4. Available online: https://enpuir.npu.edu.ua/handle/123456789/10569 (accessed on 10 September 2024).
  25. Laufer, Batia. 1992. How much lexis is necessary for reading comprehension? In Vocabulary and Applied Linguistics. Berlin and Heidelberg: Springer, pp. 126–32. [Google Scholar]
  26. Liu, Chen Yu, and Howard Hao-Jen Chen. 2019. Academic spoken vocabulary in TED Talks: Implications for academic listening. English Teaching and Learning 43: 353–68. [Google Scholar] [CrossRef]
  27. Macalister, John, and Ian Stephen Paul Nation. 2020. Language Curriculum Design. London: Routledge. [Google Scholar]
  28. Martinez, Ron, and Norbert Schmitt. 2012. A phrasal expressions list. Applied Linguistics 33: 299–320. [Google Scholar] [CrossRef]
  29. Matsuoka, Warren, and David Hirsh. 2010. Vocabulary Learning through Reading: Does an ELT Course Book Provide Good Opportunities? Reading in a Foreign Language 22: 56–70. [Google Scholar]
  30. McCarthy, Michael, and Ronald Carter. 2006. This that and the other: Multi-word clusters in spoken English and visible patterns of interaction. In Explorations in Corpus Linguistics. Edited by Michael McCarthy. Cambridge: Cambridge University Press, pp. 7–26. [Google Scholar]
  31. Mo, Junhua, and Peng Bi. 2024. Evaluation of Vocabulary Use in EFL Textbooks: Evidence from Curriculum Words. English Teaching and Learning, 1–16. [Google Scholar] [CrossRef]
  32. Mojgan, Rashtchi, and Mohammad Mazraehno Reza Tollabi. 2019. Exploring Iranian EFL learners’ listening skills via TED Talks: Does medium make a difference? Journal of Language and Education 5: 81–97. [Google Scholar] [CrossRef]
  33. Morrow, Keith. 1977. Authentic texts and ESP. In English for Specific Purposes. Edited by Susan Holden. London: Modern English Publications, pp. 13–17. [Google Scholar]
  34. Nation, Ian Stephen Paul. 2006. How large a vocabulary is needed for reading and listening? Canadian Modern Language Review 63: 59–82. [Google Scholar] [CrossRef]
  35. Nation, Ian Stephen Paul. 2012. The BNC/COCA Word Family Lists (17 September 2012). Unpublished Paper. Available online: www.victoria.ac.nz/lals/about/staff/paul-nation (accessed on 10 September 2024).
  36. Nation, Ian Stephen Paul. 2022. Learning Vocabulary in Another Language, 3rd ed. Cambridge: Cambridge University Press. [Google Scholar]
  37. Nation, Ian Stephen Paul, and Stuart A. Webb. 2011. Researching and Analyzing Vocabulary. Boston: Heinle, Cengage Learning. [Google Scholar]
  38. Nation, Ian Stephen Paul, Dongwang Shin, and Lynn Grant. 2016. Multiword units. In Making and Using Word Lists for Language Learning and Testing. Edited by Ian Stephen Paul Nation. Amsterdam: John Benjamins. [Google Scholar]
  39. Nurmukhamedov, Ulugbek. 2017. Lexical coverage of TED Talks: Implications for vocabulary instruction. TESOL Journal 8: 768–90. [Google Scholar] [CrossRef]
  40. Nurmukhamedov, Ulugbek, and Randall Sadler. 2011. Podcasts in four categories: Applications to language learning. In Academic Podcasting and Mobile Assisted Language Learning. Edited by Betty Rose Facer and M’hammed Abdous. Portland: Book News, pp. 176–95. [Google Scholar]
  41. O’Loughlin, Richard. 2012. Tuning in to vocabulary frequency in coursebooks. RELC Journal 43: 255–69. [Google Scholar] [CrossRef]
  42. Romanelli, Frank, Jeff Cain, and Patrick J. McNamara. 2014. Should TED talks be teaching us something? American Journal of Pharmaceutical Education 78: 113. [Google Scholar] [CrossRef]
  43. Schmitt, Norbett, and Diane Schmitt. 2014. A reassessment of frequency and vocabulary size in L2 vocabulary teaching. Language Teaching 47: 484–503. [Google Scholar] [CrossRef]
  44. Simpson-Vlach, Rita, and Nick C. Ellis. 2010. An academic formulas list: New methods in phraseology research. Applied Linguistics 31: 487–512. [Google Scholar] [CrossRef]
  45. Siyanova, Anna, and Norbett Schmitt. 2008. L2 learner production and processing of collocation: A multi-study perspective. Canadian Modern Language Review 64: 429–58. [Google Scholar] [CrossRef]
  46. Siyanova-Chanturia, Anna, and Ron Martinez. 2015. The idiom principle revisited. Applied Linguistics 36: 549–69. [Google Scholar] [CrossRef]
  47. Sun, Ye, and Thi Ngoc Yen Dang. 2020. Vocabulary in high-school EFL textbooks: Texts and learner knowledge. System 93: 102279. [Google Scholar] [CrossRef]
  48. TED Talks. 2024. TED Talks Website. Available online: https://www.ted.com/talks (accessed on 10 September 2024).
  49. Van-Zeeland, Hilde, and Norbett Schmitt. 2013. Lexical coverage in L1 and L2 listening comprehension: The same or different from reading comprehension? Applied Linguistics 34: 457–79. [Google Scholar] [CrossRef]
  50. Wang, Chen, and Yuhua Liu. 2023. Measuring Vocabulary Use in Chinese Tertiary Textbooks: Potentials for Incidental Vocabulary Learning. Asian Journal of English Language Teaching 32: 131–54. [Google Scholar]
  51. Warnby, Marcus. 2023. Academic vocabulary knowledge among adolescents in university preparatory programmes. Journal of English for Academic Purposes 61: 101203. [Google Scholar] [CrossRef]
  52. West, Michael. 1953. A General Service List of English Words: With Semantic Frequencies and a Supplementary Word-List for the Writing of Popular Science and Technology. Harlow: Longman. [Google Scholar]
  53. Wingrove, Peter. 2022. Academic lexical coverage in TED talks and academic lectures. English for Specific Purposes 65: 79–94. [Google Scholar] [CrossRef]
  54. Wood, David. 2002. Formulaic language acquisition and production: Implications for teaching. TESL Canada Journal 20: 1–15. [Google Scholar] [CrossRef]
  55. Wood, David C., and Randy Appel. 2014. Multiword constructions in first year business and engineering university textbooks and EAP textbooks. Journal of English for Academic Purposes 15: 1–13. [Google Scholar] [CrossRef]
  56. Yang, Lu, and Averil Coxhead. 2020. A corpus-based study of vocabulary in the new concept English textbook series. RELC Journal 53: 1–15. [Google Scholar] [CrossRef]
Table 1. Previous studies examining the vocabulary load of commercial textbooks (adapted from Benson and Madarbakus-Ring 2021).
Table 1. Previous studies examining the vocabulary load of commercial textbooks (adapted from Benson and Madarbakus-Ring 2021).
StudyTextbook95% Coverage98% Coverage
Matsuoka and Hirsh (2010)New Headway (Upper Intermediate)2000-
Hajiyeva (2015)University Textbook (11 titles)35009000
Coxhead et al. (2017)ESP Textbooks (15 titles)30007000
Sun and Dang (2020)Yilin (Four levels)30009000
Yang and Coxhead (2020)New Concept English (Two levels)3000
4000
5000
6000
Table 2. Vocabulary load findings from selected TED Talks.
Table 2. Vocabulary load findings from selected TED Talks.
StudyCoxhead and Walls (2012)
60 Talks
Liu and Chen (2019)
2089 Talks
Nurmukhamedov (2017)
400 Talks
Threshold95%98%95%98%95%98%
All Talks40009000--40008000
Business50009000--40008000
Culture--30006000--
Design5000800030006000--
Entertainment5000800030006000--
Global Issues500090003000500040008000
Science5000800030007000400010,000
Technology500090003000600040008000
Note: the 95%/98% threshold was calculated according to Nation (2012).
Table 3. Previous studies examining the academic vocabulary coverage of TED Talks.
Table 3. Previous studies examining the academic vocabulary coverage of TED Talks.
StudyTED Talks CorpusWord List
Analysis
Academic
Vocabulary
Coverage
Nurmukhamedov and Sadler (2011)Ken Robinson “Schools Kill Creativity” (1 TED Talk)AWL5%
Coxhead and Walls (2012)Six by Six corpus
(60 TED Talks)
AWL3.90%
Nurmukhamedov (2017)The TED Talks Corpus
(400 TED Talks)
AWL3.79%
Wingrove (2022)TED Talks Corpus
(2483 TED Talks)
AWL/AVL4.09%
Note: all studies used the academic word list (AWL) (Coxhead 2000) for the vocabulary analysis.
Table 4. Keynote 2 listening texts by unit.
Table 4. Keynote 2 listening texts by unit.
UnitTED TalkListening TimeNumber of Tokens
1Munir Virani—“Why I Love Vultures”6.03954
2A.J. Jacobs—“The World’s Largest Family Reunion”9.441466
3Ann Morgan—“My Year Reading a Book from Every Country in the World”12.041922
4Daria van den Bercken—“Why I Take the Piano on the Road and in the Air”9.31582
5Roman Mars—“Why City Flags May be the Worst Designed Thing You Have Ever Noticed”18.093017
6Jarrett Krosoczka—“How a Boy Became an Artist”18.403283
7Andras Forgacs—“Leather and Meat Without Killing Animals”8.591149
8Alessandra Orofino—“It Is Our City”15.161902
9Joy Sun—“Should You Donate Differently?”7.351050
10Tan Le—“A Headset that Reads Your Brainwaves”10.311440
11Louie Schwartzberg—“The Hidden Beauty of Pollination”7.33415
12Nizar Ibrahim—“How We Unearthed Spinosaurus”6.03940
Total128.08 min/s18,120 tokens
Table 5. Vocabulary load of TED Talks across Nation’s (2012) 25,000 BNC/COCA word lists.
Table 5. Vocabulary load of TED Talks across Nation’s (2012) 25,000 BNC/COCA word lists.
Unit
Word ListU1U2U3 U4U5U6U7U8U9U10U11U12TOTAL
SL 31–345.023.964.686.706.402.981.834.272.002.642.634.803.99
183.1188.4889.2891.4183.8690.9578.2485.2484.8681.3987.2886.7185.90
290.2494.2894.6994.8593.6495.6486.7792.8194.8691.4691.7290.4392.61
393.0797.7698.1896.5796.8997.5695.9197.9698.1995.4995.5692.6696.31
493.7098.3798.7097.6097.8298.4797.0498.5999.2497.0296.3795.2197.34
594.6498.9898.9697.9498.4298.7198.4399.1299.3498.4897.3897.8798.18
695.3799.4699.0199.4998.7599.0198.6999.7099.5399.1097.5898.1998.65
795.7999.5399.0699.4999.1199.1098.9599.8199.6399.2497.5898.7298.83
898.7399.8799.2299.6699.5499.3998.6899.8699.9299.4599.2098.5699.34
Note: Blue indicates 95% coverage and red indicates 98% coverage/SL = supplementary lists (31–34).
Table 6. Cumulative coverage of Nation’s (2012) BNC/COCA lists in the TED Talks and Keynote 2.
Table 6. Cumulative coverage of Nation’s (2012) BNC/COCA lists in the TED Talks and Keynote 2.
Word List
(By 1000 Words per List)
Cumulative Coverage of TED TalksCumulative Coverage of Keynote 2
Supp. lists3.994.62
1st 100085.9086.00
2nd 100092.6194.47
3rd 100096.3197.49
4th 100097.3498.23
5th 100098.1898.91
6th 100098.6599.29
7th 100098.8399.45
8th 100099.3499.57
9th–25th 1000100100
Note: Blue indicates 95% coverage and red indicates 98% coverage; 9th–25th 1000 = word lists from the first 9000 to the first 25,000 lists.
Table 7. Coverage of NAWL in each TED Talk and Keynote 2.
Table 7. Coverage of NAWL in each TED Talk and Keynote 2.
UnitLemmaFreq.%In TED TalksTextbook %Also Occurs in Unit (Frequency in Unit)
112121.27Amongst, flesh, ecological, bacteria, absorb, tremendous, unity, predator, critically, prey, ecology, locally1.45Predator (1)
Flesh (2)
Critically (2)
Prey (2)
Ecological (2)
211130.89Ancestor (s), offspring, continent, minus, migrate, accumulate, sub, sperm, donor, spectrum, tribe0.62Ancestor (7)
310170.91Publish (ed, ing), translation, facilitate, clarify, intensive, entities, manuscripts, rituals0.64Publish (2)
4330.54Composer, media, prejudice0.71Composer (1)
Prejudice (1)
518200.67Meaningful, feedback, simplify, indicator, stripes, pole, essentially, horizontal, namely, loop, crude, painful, wheat, depict, reconstruct, founding, underneath, ridiculous,0.97Simplify (1)
Meaningful (3)
615250.76Publish (ed, ing), monkey, classroom (s), maternal, elementary, afterwards, myth, cheated, blank, tempt, halfway, beaming, drained, consciousness, scholarship0.64Monkey (2)
Publish (3)
Classroom (2)
713171.5Lab (s), organs, essentially, goods, sophisticated, sustainable, technically, dimensional, horizon, multiply, matrix, insects, elasticity1.83Lab (7)
Sustainable (2)
817221.46Incredible, incredibly, impact, consumption, emissions, aspects, sub, importantly, complement, collective, inequality, widespread, tech, authorities, allocate, amongst, afterwards0.54Consumption (1)
9770.67Economists, allocating, empirical, click, coordinates, corruption, goods0.60
1023352.47Detection (s), algorithm (s), neurons, impulse (s), cortex, cognitive, neutral, interface, conscious, explicitly, realm, interact, emits, outer, identical, array, sensitivity, scroll, duration, analogies, goodness, differentiate, robots1.68Interface (1)
Conscious (1)
11551.05Bats, colony, reproduce, reproduction, naked0.71Colony (1)
Reproduction (2)
1216252.73Incredible, fossil (s), bizarre, partial, dense, ultimate, prey, snakes, specimens, reconstruct, sediment, compact, limbs, fleshed, ruler, lifetime1.49Bizarre (2)
Incredible (2)
Fossil (10)
TOTAL1502011.24 0.99
Table 8. Frequency per million of the phrasal expressions and academic formulas in each TED Talk and Keynote 2.
Table 8. Frequency per million of the phrasal expressions and academic formulas in each TED Talk and Keynote 2.
Relative FrequencyTED TalksTextbookExamples
0076I mean, those who, to do with
060you have a, you look at, we can see
1–50028take up, the case, as a result
018you need to, we look at, to look at
51–1007423other than, a range of, as well
5713they do not, to each other, there may be
101–150265in front of, choose to, so that
202we need to, the number of, at the same time
151–200154a couple of, find out, you see
103the end of, of the same, if you want to
201–25091of course, think that, no idea
50you have to, we need to, if you have
251–30061at least, find out, sort of
11you can see
301–35031look like, work on, right now
351–40000
30be able to, look at the, you want to
401–45001such a
10part of the
451–50000
501 and above81have to, a lot, a few
White—phrasal expressions (total MWUs—141); grey—academic formulas (total MWUs—97).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Madarbakus-Ring, N.; Benson, S. TED Talks and the Textbook: An In-Depth Lexical Analysis. Languages 2024, 9, 309. https://doi.org/10.3390/languages9100309

AMA Style

Madarbakus-Ring N, Benson S. TED Talks and the Textbook: An In-Depth Lexical Analysis. Languages. 2024; 9(10):309. https://doi.org/10.3390/languages9100309

Chicago/Turabian Style

Madarbakus-Ring, Naheen, and Stuart Benson. 2024. "TED Talks and the Textbook: An In-Depth Lexical Analysis" Languages 9, no. 10: 309. https://doi.org/10.3390/languages9100309

APA Style

Madarbakus-Ring, N., & Benson, S. (2024). TED Talks and the Textbook: An In-Depth Lexical Analysis. Languages, 9(10), 309. https://doi.org/10.3390/languages9100309

Article Metrics

Back to TopTop