AI Language Models: An Opportunity to Enhance Language Learning
Abstract
:1. Introduction
1.1. Scope and Overviews
1.2. Research Questions and Hypotheses
2. Background
2.1. Computational Approach to L2 Writing Proficiency
2.2. Contextualized Meaning Representations in LLMs
3. Materials and Methods
3.1. Text Preprocessing
3.2. Technical Details
3.3. Experimental Design
- (a)
- Independent variables in the format of LLM _measurement unit_statistics. For example, bert_mv5_q5 refers to a BERT-large-uncased-derived metric, which calculates the average similarity of each pair of words in a 5- or 10-word moving window (mv5/10) for the whole sample, then takes the Q5 for the sample-aggregated mv5/10:
- LLMs: BERT-large-uncased; T5-large; Llama2.
- Measurement unit:
- ○
- Within a sentence:
- ▪
- mv5/10: average similarity of each pair of words in a 5-word or 10-word moving window.
- ▪
- k1:10: pairwise word-to-word similarity at k inter-word similarity, with k ranging from 1 to 10.
- ○
- Beyond a sentence:
- ▪
- foc: first-order coherence, referring to the cosine similarity of adjacent sentence pairs.
- ▪
- soc: second-order coherence, referring to the cosine similarity of sentence pairs with one intervening sentence.
- Statistics: Mean; Median; Q5; Q95; IQR.
- (b)
- Dependent variables, validated and provided by the L1 and L2 corpora:
- MTELP_Conv_Score: MTELP total combined proficiency score.
- level_id: indicates the level of the speaker.
- Writing_Sample: in-house writing test score (scale of 1–6; a bigger score is associated with higher writing proficiency).
4. Results
4.1. LLM Similarities Detect L2 and L1 Writing
4.1.1. Word-Level Metrics
4.1.2. Sentence-Level Metrics
4.1.3. Regression Model Comparison
4.2. LLM Similarities Index L2 Proficiency Levels
4.2.1. Word-Level Metrics
4.2.2. Sentence-Level Metrics
4.3. LLM Similarities Correlate with Overall Scores and Writing Scores
5. Conclusions and Discussion
5.1. Interpreting LLM Similarity Scores in an L2 Setting
5.2. LLM Implications in Language Learning and Teaching
5.3. AI Tool Usage in Education
5.4. Limitations and Future Directions
Supplementary Materials
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Bommasani, R.; Hudson, D.A.; Adeli, E.; Altman, R.; Arora, S.; von Arx, S.; Bernstein, M.S.; Bohg, J.; Bosselut, A.; Brunskill, E.; et al. On the opportunities and risks of foundation models. arXiv 2021, arXiv:2108.07258. [Google Scholar]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 2020, 21, 1–67. [Google Scholar]
- Dewaele, J.-M. Individual differences in second language acquisition. New Handb. Second. Lang. Acquis. 2009, 2, 623–646. [Google Scholar]
- Narcy-Combes, J.-P. Rod Ellis, Task-based Language Learning and Teaching. Cah. L’apliut 2003, 22, 87–88. [Google Scholar] [CrossRef]
- Ouyang, J.; Jiang, J.; Liu, H. Dependency distance measures in assessing L2 writing proficiency. Assess. Writ. 2022, 51, 100603. [Google Scholar] [CrossRef]
- Egbert, J. Corpus linguistics and language testing: Navigating uncharted waters. Lang. Test. 2017, 34, 555–564. [Google Scholar] [CrossRef]
- Kyle, K.; Crossley, S. Assessing syntactic sophistication in L2 writing: A usage-based approach. Lang. Test. 2017, 34, 513–535. [Google Scholar] [CrossRef]
- Lu, X. Automatic analysis of syntactic complexity in second language writing. Int. J. Corpus Linguist. 2010, 15, 474–496. [Google Scholar] [CrossRef]
- Lu, X. A Corpus-Based Evaluation of Syntactic Complexity Measures as Indices of College-Level ESL Writers’ Language Development. TESOL Q. 2011, 45, 36–62. [Google Scholar] [CrossRef]
- Lu, X. The Relationship of Lexical Richness to the Quality of ESL Learners’ Oral Narratives. Mod. Lang. J. 2012, 96, 190–208. [Google Scholar] [CrossRef]
- Lu, X. Automated measurement of syntactic complexity in corpus-based L2 writing research and implications for writing assessment. Lang. Test. 2017, 34, 493–511. [Google Scholar] [CrossRef]
- Zhang, X.; Lu, X. Revisiting the predictive power of traditional vs. fine-grained syntactic complexity indices for L2 writing quality: The case of two genres. Assess. Writ. 2022, 51, 100597. [Google Scholar] [CrossRef]
- Ortega, L. Syntactic Complexity Measures and their Relationship to L2 Proficiency: A Research Synthesis of College-level L2 Writing. Appl. Linguist. 2003, 24, 492–518. [Google Scholar] [CrossRef]
- Ortega, L. Interlanguage complexity: A construct in search of theoretical renewal. In Linguistic Complexity; De Gruyter: Berlin, Germany, 2012; pp. 127–155. [Google Scholar] [CrossRef]
- Polio, C.G. Second Language development in writing: Measures of fluency, accuracy, and complexity. Kate Wolfe-Quintero 2001, Shunji Inagaki, and Hae-Young Kim. Honolulu: University of Hawai‘i Press, 1998. Pp. viii + 187. $20.00 paper. Stud. Second. Lang. Acquis. 2001, 23, 423–425. [Google Scholar] [CrossRef]
- Bulté, B.; Roothooft, H. Investigating the interrelationship between rated L2 proficiency and linguistic complexity in L2 speech. System 2020, 91, 102246. [Google Scholar] [CrossRef]
- De Clercq, B. The development of lexical complexity in second language acquisition: A cross-linguistic study of L2 French and English. Eurosla Yearb. 2015, 15, 69–94. [Google Scholar] [CrossRef]
- De Clercq, B.; Housen, A. The development of morphological complexity: A cross-linguistic study of L2 French and English. Second. Lang. Res. 2016, 35, 71–97. [Google Scholar] [CrossRef]
- Kettunen, K. Can Type-Token Ratio be Used to Show Morphological Complexity of Languages? J. Quant. Linguist. 2014, 21, 223–245. [Google Scholar] [CrossRef]
- Kim, M.; Crossley, S.A.; Kyle, K. Lexical Sophistication as a Multidimensional Phenomenon: Relations to Second Language Lexical Proficiency, Development, and Writing Quality. Mod. Lang. J. 2017, 102, 120–141. [Google Scholar] [CrossRef]
- Treffers-Daller, J.; Parslow, P.; Williams, S. Back to Basics: How Measures of Lexical Diversity Can Help Discriminate between CEFR Levels. Appl. Linguist. 2016, 39, 302–327. [Google Scholar] [CrossRef]
- Touvron, H.; Martin, L.; Stone, K.; Albert, P.; Almahairi, A.; Babaei, Y.; Bashlykov, N.; Batra, S.; Bhargava, P.; Bhosale, S.; et al. Llama 2: Open foundation and fine-tuned chat models. arXiv 2023, arXiv:2307.09288. [Google Scholar]
- Kasneci, E.; Seßler, K.; Küchemann, S.; Bannert, M.; Dementieva, D.; Fischer, F.; Gasser, U.; Groh, G.; Günnemann, S.; Hüllermeier, E.; et al. ChatGPT for good? On opportunities and challenges of large language models for education. Learn. Individ. Differ. 2024, 103, 102274. [Google Scholar] [CrossRef]
- Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.; Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Funtowicz, M.; et al. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Online, 16–20 November 2020. [Google Scholar] [CrossRef]
- Tenney, I.; Das, D.; Pavlick, E. BERT Rediscovers the Classical NLP Pipeline. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 11 July 2019. [Google Scholar] [CrossRef]
- Wiedemann, G.; Remus, S.; Chawla, A.; Biemann, C. Does BERT make any sense? Interpretable word sense disambiguation with contextualized embeddings. arXiv 2019, arXiv:1909.10430. [Google Scholar]
- Camacho-Collados, J.; Pilehvar, M.T. From word to sense embeddings: A survey on vector representations of meaning. J. Artif. Intell. Res. 2018, 63, 743–788. [Google Scholar] [CrossRef]
- Loureiro, S.M.C.; Guerreiro, J.; Tussyadiah, I. Artificial intelligence in business: State of the art and future research agenda. J. Bus. Res. 2021, 129, 911–926. [Google Scholar] [CrossRef]
- Naismith, B.; Han, N.-R.; Juffs, A. The University of Pittsburgh English Language Institute Corpus (PELIC). Int. J. Learn. Corpus Res. 2022, 8, 121–138. [Google Scholar] [CrossRef]
- Misra, K. minicons: Enabling flexible behavioral and representational analyses of transformer language models. arXiv 2022, arXiv:2203.13112. [Google Scholar]
- Jurafsky, D.; Martin, J.H. Speech and Language Processing, 3rd Ed. Draft. 2023. Available online: https://web.stanford.edu/~jurafsky/slp3/ (accessed on 16 July 2023).
- Parola, A.; Lin, J.M.; Simonsen, A.; Bliksted, V.; Zhou, Y.; Wang, H.; Inoue, L.; Koelkebeck, K.; Fusaroli, R. Speech disturbances in schizophrenia: Assessing cross-linguistic generalizability of NLP automated measures of coherence. Schizophr. Res. 2023, 259, 59–70. [Google Scholar] [CrossRef]
- Lenci, A.; Sahlgren, M.; Jeuniaux, P.; Cuba Gyllensten, A.; Miliani, M. A comparative evaluation and analysis of three generations of Distributional Semantic Models. Lang. Resour. Eval. 2022, 56, 1269–1313. [Google Scholar] [CrossRef]
- Vulić, I.; Ponti, E.M.; Litschko, R.; Glavaš, G.; Korhonen, A. Probing pretrained language models for lexical semantics. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 7222–7240. [Google Scholar]
Mean | Min | Max | Std | |
---|---|---|---|---|
L1 | 134.564 | 51 | 299 | 58.802 |
L2 | 162.647 | 54 | 298 | 60.693 |
BERT | Llama2 | T5 | |
---|---|---|---|
mv5/10 | 0.91 | 0.79 | 0.66 |
k1:10 | 0.95 | 0.89 | 0.92 |
foc/soc | 0.7 | 0.72 | 0.73 |
BERT | Overall | Writing | Llama | Overall | Writing | T5 | Overall | Writing |
---|---|---|---|---|---|---|---|---|
foc_iqr | 0.03 | −0.01 | foc_iqr | 0.02 | 0.05 | foc_iqr | 0.03 | 0.09 *** |
foc_median | 0 | 0.06 ** | foc_median | 0.07 ** | 0.08 *** | foc_median | −0.05 * | −0.01 |
soc_iqr | −0.05 | −0.02 | soc_iqr | −0.06 * | −0.02 | soc_iqr | −0.02 | 0.02 |
soc_median | −0.02 | 0.05 | soc_median | 0.01 | 0.06 ** | soc_median | −0.09 *** | −0.01 |
mv10_iqr | −0.03 | −0.01 | mv10_iqr | 0.03 | 0.03 | mv10_iqr | −0.01 | 0.02 |
mv10_median | −0.15 *** | −0.05 | mv10_median | 0.07 ** | 0.08 *** | mv10_median | −0.11 *** | −0.06 ** |
mv5_iqr | 0.01 | 0.03 | mv5_iqr | 0.03 | 0.01 | mv5_iqr | 0.04 | 0.03 |
mv5_median | −0.13 *** | −0.04 | mv5_median | 0.09 *** | 0.08 *** | mv5_median | −0.05 * | 0.02 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cong, Y. AI Language Models: An Opportunity to Enhance Language Learning. Informatics 2024, 11, 49. https://doi.org/10.3390/informatics11030049
Cong Y. AI Language Models: An Opportunity to Enhance Language Learning. Informatics. 2024; 11(3):49. https://doi.org/10.3390/informatics11030049
Chicago/Turabian StyleCong, Yan. 2024. "AI Language Models: An Opportunity to Enhance Language Learning" Informatics 11, no. 3: 49. https://doi.org/10.3390/informatics11030049
APA StyleCong, Y. (2024). AI Language Models: An Opportunity to Enhance Language Learning. Informatics, 11(3), 49. https://doi.org/10.3390/informatics11030049