Representation and Processing of L2 Compositional Multiword Sequences: Effects of Token Frequency, Type Frequency, and Constituency
Abstract
:1. Introduction
2. Literature Review
2.1. Token Frequency and Type Frequency
2.2. Major Factors Influencing CMS Processing
3. Method
3.1. Participants
3.2. Stimuli
3.3. Procedures
3.4. Data Analysis
4. Results
4.1. General Processing Patterns
4.2. Token Frequency Effects
4.3. Type Frequency Effects
4.4. Constituency Effects
5. Discussion
5.1. The Effect of Token Frequency in L2 CMS Processing
5.2. The Effect of Type Frequency in L2 CMS Processing
5.3. The Effect of Constituency in L2 CMS Processing
6. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
1 | We conducted supplementary analyses to assess the robustness of our findings after including participants initially excluded based on QPT scores. The results remained largely consistent with the original analyses, with all key effects retaining significance and demonstrating comparable effect sizes. |
2 | Repeated-measures ANOVAs conducted by items on log-transformed RTs and ERs produced results largely consistent with the analyses conducted by participants. |
References
- Arnon, I., & Cohen-Priva, U. (2013). More than words: The effect of multi-word frequency and constituency on phonetic duration. Language and Speech, 56(3), 349–371. [Google Scholar] [CrossRef]
- Arnon, I., & Snider, N. (2010). More than words: Frequency effects for multi-word phrases. Journal of Memory and Language, 62(1), 67–82. [Google Scholar] [CrossRef]
- Berg, T. (2014). On the relationship between type and token frequency. Journal of Quantitative Linguistics, 21(3), 199–222. [Google Scholar] [CrossRef]
- Biber, D. (2009). A corpus-driven approach to formulaic language in English: Multi-word patterns in speech and writing. International Journal of Corpus Linguistics, 14(3), 275–311. [Google Scholar] [CrossRef]
- Bybee, J. L. (2008). Usage-based grammar and second language acquisition. In P. Robinson, & N. C. Ellis (Eds.), Handbook of cognitive linguistics and second language acquisition (pp. 216–236). Routledge. [Google Scholar]
- Bybee, J. L. (2013). Usage-based theory and exemplar representations of constructions. In T. Hoffmann, & G. Trousdale (Eds.), The Oxford handbook of construction grammar (pp. 49–69). Oxford University Press. [Google Scholar]
- Bybee, J. L., & Thompson, S. (1997). Three frequency effects in syntax. Berkeley Linguistic Society, 23(1), 65–85. [Google Scholar] [CrossRef]
- Chen, K., Gu, L., & Bai, Q. (2023). Processing Chinese formulaic sequences in sentence context: A comparative study of native and non-native speakers. Humanities and Social Sciences Communications, 10(1), 622. [Google Scholar] [CrossRef]
- Dąbrowska, E., & Szczerbinski, M. (2006). Polish children’s productivity with case marking: The role of regularity, type frequency, and phonological diversity. Child Language, 33(3), 559–597. [Google Scholar] [CrossRef]
- Diessel, H. (2015). Usage-based construction grammar. In E. Dąbrowska, & D. Divjak (Eds.), Handbook of cognitive linguistics (pp. 295–321). Mouton de Gruyter. [Google Scholar]
- Ellis, N. C. (2002). Frequency effects in language processing: A review with implications for theories of implicit and explicit language acquisition. Studies in Second Language Acquisition, 24(2), 143–188. [Google Scholar] [CrossRef]
- Ellis, N. C., & Simpson-Vlach, R. (2009). Formulaic language in native speakers: Triangulating psycholinguistics, corpus linguistics, and education. Corpus Linguistics and Linguistic Theory, 5(1), 61–78. [Google Scholar] [CrossRef]
- Ellis, N. C., Simpson-Vlach, R., & Maynard, C. (2008). Formulaic language in native and second-language speakers: Psycholinguistics, corpus linguistics, and TESOL. TESOL Quarterly, 42(3), 375–396. [Google Scholar] [CrossRef]
- Goldberg, A. E. (1995). Constructions: A construction grammar approach to argument structure. University of Chicago Press. [Google Scholar]
- Gries, S. T., & Ellis, N. C. (2015). Statistical measures for usage-based linguistics. Language Learning, 65(S1), 228–255. [Google Scholar] [CrossRef]
- Hernández, M., Costa, A., & Arnon, I. (2016). More than words: Multiword frequency effects in non-native speakers. Language, Cognition and Neuroscience, 31(6), 785–800. [Google Scholar] [CrossRef]
- Jeong, H., & Jiang, N. (2019). Representation and processing of lexical bundles: Evidence from word monitoring. System, 80, 188–198. [Google Scholar] [CrossRef]
- Jiang, N., & Nekrasova, T. M. (2007). The processing of formulaic sequences by second language speakers. The Modern Language Journal, 91(3), 433–445. [Google Scholar] [CrossRef]
- Jiang, S., & Siyanova-Chanturia, A. (2023). The processing of multiword expressions in L1 andL2 Chinese: Evidence from reaction times and eye movements. The Modern Language Journal, 107(2), 565–605. [Google Scholar] [CrossRef]
- Jolsvai, H., McCauley, S. M., & Christiansen, M. H. (2013). Meaning overrides frequency in idiomatic and compositional multiword chunks. In M. Knauff, M. Pauen, N. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the 35th annual conference of the cognitive science society (pp. 692–697). Cognitive Science Society. [Google Scholar]
- Jolsvai, H., McCauley, S. M., & Christiansen, M. H. (2020). Meaningfulness beats frequency in multiword chunk processing. Cognitive Science, 44(10), e12885. [Google Scholar] [CrossRef]
- Kosaka, T. (2024). The multiword processing by low-proficiency Japanese English learners: Meaningfulness and constructions. International Journal of Applied Linguistics, 34(2), 672–691. [Google Scholar] [CrossRef]
- Matthews, D., & Bannard, C. (2010). Children’s production of unfamiliar word sequences is predicted by positional variability and latent classes in a large sample of child-directed speech. Cognitive Science, 34(3), 465–488. [Google Scholar] [CrossRef]
- Nekrasova, T. M. (2009). English L1 and L2 speakers’ knowledge of lexical bundles. Language Learning, 59(3), 647–686. [Google Scholar] [CrossRef]
- Nesi, H., & Basturkmen, H. (2006). Lexical Bundles and discourse signalling in academic lecture. International Journal of Corpus Linguistics, 11(3), 283–304. [Google Scholar] [CrossRef]
- Ren, J. (2022). A comparative study of the phrase frames used in the essays of native and nonnative English students. Lingua, 274, 103376. [Google Scholar] [CrossRef]
- Römer, U. (2009). The inseparability of lexis and grammar: Corpus linguistics perspectives. Annual Review of Cognitive Linguistics, 7(1), 140–162. [Google Scholar] [CrossRef]
- Schmitt, N., Grandage, S., & Adolphs, S. (2004). Are corpus-derived recurrent clusters psycholinguistically valid? In N. Schmitt (Ed.), Formulaic sequences: Acquisition, processing and use (pp. 127–151). John Benjamins. [Google Scholar]
- Shannan, C. E., & Weaver, W. (1949). The mathematical theory of communication. University of Illinois Press. [Google Scholar]
- Shantz, K. (2017). Phrase frequency, proficiency and grammaticality interact in non-native processing: Implications for theories of SLA. Second Language Research, 33(1), 91–118. [Google Scholar] [CrossRef]
- Siyanova-Chanturia, A., Conklin, K., & van Heuven, W. J. B. (2011). Seeing a phrase “time and again” matters: The role of phrasal frequency in the processing of multiword sequences. Journal of Experimental Psychology: Learning, Memory, and Cognition, 37(3), 776–784. [Google Scholar] [CrossRef]
- Supasiraprapa, S. (2019). Frequency effects on first and second language compositional phrase comprehension and production. Applied Psycholinguistics, 40(4), 987–1017. [Google Scholar] [CrossRef]
- Tan, Y., & Römer, U. (2022). Using phrase-frames to trace the language development of L1 Chinese learners of English. System, 108, 102844. [Google Scholar] [CrossRef]
- Tremblay, A., & Baayen, H. (2010). Holistic processing of regular four-word sequences: A behavioral and ERP study of the effects of structure, frequency, and probability on immediate free recall. In D. Wood (Ed.), Perspectives on formulaic language: Acquisition and communication (pp. 151–173). Continuum International Publishing. [Google Scholar]
- Tremblay, A., Derwing, B., Libben, G., & Westbury, C. (2011). Processing advantages of lexical bundles: Evidence from self-paced reading and sentence recall tasks. Language Learning, 61(2), 569–613. [Google Scholar] [CrossRef]
- University of Cambridge Local Examinations Syndicate. (2002). Quick placement test: Paper and pen version pack. Oxford University Press. [Google Scholar]
- Valsecchi, M., Künstler, V., Saage, S., White, B. J., Mukherjee, J., & Gegenfurtner, K. R. (2013). Advantage in reading lexical bundles is reduced in non-native speakers. Journal of Eye Movement Research, 6(5), 1–15. [Google Scholar] [CrossRef]
- Wolter, B., & Gyllstad, H. (2013). Frequency of input and L2 collocational processing: A comparison of congruent and incongruent collocations. Studies in Second Language Acquisition, 35(3), 451–482. [Google Scholar] [CrossRef]
- Wolter, B., & Yamashita, J. (2018). Word frequency, collocational frequency, L1 congruency, and proficiency in L2 collocational processing: What accounts for L2 performance? Studies in Second Language Acquisition, 40(2), 395–416. [Google Scholar] [CrossRef]
- Wulff, S. (2019). Acquisition of formulaic language from a usage-based perspective. In A. Siyanova-Chanturia, & A. Pellicer-Sánchez (Eds.), Understanding formulaic language: A second language acquisition perspective (pp. 19–37). Routledge. [Google Scholar]
- Yi, W., & Zhong, Y. (2024). The processing advantage of multiword sequences: A meta-analysis. Studies in Second Language Acquisition, 46(2), 427–452. [Google Scholar] [CrossRef]
- Yu, M., Xu, S., Yang, L., & Chen, S. (2025). The influence of input frequency and L2 proficiency on the representation of collocations for Chinese EFL learners. Behavioral Sciences, 15(1), 46. [Google Scholar] [CrossRef] [PubMed]
Proficiency | N | Age | Gender (M/F) | YFEE | YREC | QPT Score |
---|---|---|---|---|---|---|
HP | 30 | 20.67 (0.55) | 5/25 | 12.73 (0.52) | 0 | 45.07 (1.89) |
LP | 30 | 18.43 (0.50) | 7/23 | 10.47 (0.51) | 0 | 36.20 (2.31) |
Group | Log-Transformed Type Frequency | Entropy | ||
---|---|---|---|---|
Mean | SD | Mean | SD | |
Group A | 3.72 | 0.10 | 8.23 | 0.68 |
Group B | 3.84 | 0.21 | 8.41 | 0.92 |
Group C | 3.25 | 0.29 | 5.62 | 1.14 |
Group D | 3.20 | 0.22 | 5.89 | 0.75 |
Group | Word Length | Log-Transformed Word Token Frequency | Log-Transformed Bi-Gram Token Frequency | Log-Transformed Whole-String Token Frequency | ||||
---|---|---|---|---|---|---|---|---|
Mean | SD | Mean | SD | Mean | SD | Mean | SD | |
Group 1 | 4.18 | 2.13 | 6.13 | 0.99 | 4.41 | 0.59 | 3.35 | 0.28 |
Group 2 | 4.09 | 2.39 | 6.41 | 1.00 | 4.32 | 0.71 | 3.39 | 0.28 |
Group 3 | 4.03 | 2.08 | 6.13 | 1.00 | 4.24 | 0.78 | 2.31 | 0.10 |
Group 4 | 4.03 | 2.38 | 6.42 | 1.01 | 4.17 | 0.84 | 2.31 | 0.10 |
Group 5 | 3.76 | 1.44 | 6.17 | 0.92 | 4.34 | 0.60 | 3.36 | 0.29 |
Group 6 | 3.79 | 1.85 | 6.24 | 0.89 | 4.19 | 0.58 | 3.39 | 0.25 |
Group 7 | 3.88 | 1.52 | 6.14 | 0.96 | 4.26 | 0.71 | 2.24 | 0.23 |
Group 8 | 3.67 | 1.78 | 6.36 | 0.74 | 4.17 | 0.48 | 2.24 | 0.30 |
LogRTs/ERs | Proficiency | Group 1 | Group 2 | Group 3 | Group 4 | Group 5 | Group 6 | Group 7 | Group 8 |
---|---|---|---|---|---|---|---|---|---|
LogRTs | HP | 3.11 (0.09) | 3.14 (0.08) | 3.13 (0.11) | 3.19 (0.09) | 3.12 (0.08) | 3.18 (0.09) | 3.17 (0.09) | 3.24 (0.08) |
LP | 3.15 (0.09) | 3.22 (0.07) | 3.20 (0.09) | 3.27 (0.09) | 3.20 (0.07) | 3.27 (0.08) | 3.24 (0.08) | 3.28 (0.09) | |
ERs | HP | 1.52 (3.45) | 3.64 (4.53) | 2.12 (3.91) | 7.58 (5.89) | 3.03 (4.36) | 6.36 (6.38) | 6.97 (5.69) | 10.30 (9.47) |
LP | 2.73 (4.24) | 5.15 (4.58) | 3.03 (4.36) | 11.51 (9.53) | 7.88 (7.05) | 7.58 (7.95) | 8.18 (7.30) | 15.46 (10.44) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xu, Y.; Yu, Y. Representation and Processing of L2 Compositional Multiword Sequences: Effects of Token Frequency, Type Frequency, and Constituency. Behav. Sci. 2025, 15, 734. https://doi.org/10.3390/bs15060734
Xu Y, Yu Y. Representation and Processing of L2 Compositional Multiword Sequences: Effects of Token Frequency, Type Frequency, and Constituency. Behavioral Sciences. 2025; 15(6):734. https://doi.org/10.3390/bs15060734
Chicago/Turabian StyleXu, Yingying, and Yang Yu. 2025. "Representation and Processing of L2 Compositional Multiword Sequences: Effects of Token Frequency, Type Frequency, and Constituency" Behavioral Sciences 15, no. 6: 734. https://doi.org/10.3390/bs15060734
APA StyleXu, Y., & Yu, Y. (2025). Representation and Processing of L2 Compositional Multiword Sequences: Effects of Token Frequency, Type Frequency, and Constituency. Behavioral Sciences, 15(6), 734. https://doi.org/10.3390/bs15060734