# Modeling Word Learning and Processing with Recurrent Neural Networks

## Abstract

**:**

## 1. Introduction

## 2. Related Work

## 3. Materials and Methods

#### 3.1. The Data

#### 3.2. The Neural Networks

#### 3.3. Training Protocol

## 4. Results

#### 4.1. Training and Test Accuracy

#### 4.2. Prediction Scores

#### 4.3. Modeling Serial Processing

#### 4.3.1. Prediction (1): Structural Effects

^{2}).

#### 4.3.2. Prediction (2): Serial Processing Effects

^{2}respectively of 85.5% and 70.1% for Italian training and test sets, and 88.6% and 68.0% for German sets, although the variance is fairly high for all sets (i.e., for each serial position, values for the prediction task in the LSTM are highly dispersed from the mean). Precisely, for both languages and both training and test sets, GAMs fitted symbol prediction with distance to the morpheme boundary (MB), word length, and morphological (ir)regularity as fixed effects, with distance to the MB and word length as smooth effects in addition.

## 5. Discussion

## 6. Conclusions

## Funding

## Acknowledgments

## Conflicts of Interest

## Abbreviations

RNN | Recurrent Neural Network |

LSTM | Long Short-Term Memory |

TSOM | Self-Organizing Map |

BMU | Best Matching Unit |

MB | Morpheme Boundary |

## References

- Harris, Z. Methods in Structural Linguistics; University of Chicago Press: Chigaco, IL, USA, 1951. [Google Scholar]
- Post, B.; Marslen-Wilson, W.; Randall, B.; Tyler, L.K. The processing of English regular inflections: Phonological cues to morphological structure. Cognition
**2008**, 109, 1–17. [Google Scholar] [CrossRef] [PubMed][Green Version] - D’Esposito, M. From cognitive to neural models of working memory. Philos. Trans. R. Soc. Biol. Sci.
**2007**, 362, 761–772. [Google Scholar] [CrossRef] [PubMed][Green Version] - Ma, W.J.; Husain, M.; Bays, P.M. Changing concepts of working memory. Nat. Neurosci.
**2014**, 17, 347–356. [Google Scholar] [CrossRef] [PubMed] - Bengio, Y.; Simard, P.; Frasconi, P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw.
**1994**, 5, 157–166. [Google Scholar] [CrossRef] [PubMed] - Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput.
**1997**, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed] - Jozefowicz, R.; Zaremba, W.; Sutskever, I. An empirical exploration of recurrent network architectures. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), Lille, France, 6–11 July 2015; pp. 2342–2350. [Google Scholar]
- Ferro, M.; Marzi, C.; Pirrelli, V. A self-organizing model of word storage and processing: Implications for morphology learning. Lingue Linguaggio
**2011**, 10, 209–226. [Google Scholar] - Marzi, C.; Ferro, M.; Nahli, O. Arabic word processing and morphology induction through adaptive memory self-organisation strategies. J. King Saud Univ. Comput. Inf. Sci.
**2017**, 29, 179–188. [Google Scholar] [CrossRef][Green Version] - Pirrelli, V.; Ferro, M.; Marzi, C. Computational complexity of abstractive morphology. In Understanding and Measuring Morphological Complexity; Baerman, M., Brown, D., Corbett, G., Eds.; Oxford University Press: Oxford, UK, 2015; pp. 141–166. [Google Scholar]
- Marzi, C.; Ferro, M.; Pirrelli, V. A processing-oriented investigation of inflectional complexity. Front. Commun.
**2019**, 4, 1–48. [Google Scholar] [CrossRef] - Cardillo, F.A.; Ferro, M.; Marzi, C.; Pirrelli, V. How “deep” is learning word inflection? In Proceedings of the 4th Italian Conference on Computational Linguistics, Rome, Italy, 1–12 December 2017; pp. 77–82. [Google Scholar]
- Mikolov, T.; Karafiát, M.; Burget, L.; Černockỳ, J.; Khudanpur, S. Recurrent neural network based language model. In Proceedings of the 11th Annual Conference of the International Speech Communication Association, Chiba, Japan, 26–30 September 2010; pp. 1045–1048. [Google Scholar]
- Botvinick, M.; Plaut, D.C. Short-term memory for serial order: A recurrent neural network model. Psychol. Rev.
**2006**, 113, 201–233. [Google Scholar] [CrossRef] [PubMed][Green Version] - Bowers, J.S.; Damian, M.F.; Davis, C.J. A fundamental limitation of the conjunctive codes learned in PDP models of cognition: Comment on Botvinick and Plaut (2006). Psychol. Rev.
**2009**, 116, 986–997. [Google Scholar] [CrossRef] [PubMed][Green Version] - Elman, J.L. Finding structure in time. Cogn. Sci.
**1990**, 4, 179–211. [Google Scholar] [CrossRef] - Jaech, A.; Ostendorf, M. Personalized language model for query auto-completion. arXiv
**2018**, arXiv:1804.09661. [Google Scholar] - Gers, F.A.; Schmidhuber, E. LSTM recurrent networks learn simple context-free and context-sensitive languages. IEEE Trans. Neural Netw.
**2001**, 12, 1333–1340. [Google Scholar] [CrossRef] [PubMed][Green Version] - Ghosh, S.; Vinyals, O.; Strope, B.; Roy, S.; Dean, T.; Heck, L. Contextual LSTM (CLSTM) models for large scale NLP tasks. arXiv
**2016**, arXiv:1602.06291. [Google Scholar] - Xu, K.; Xie, L.; Yao, K. Investigating LSTM for punctuation prediction. In Proceedings of the 10th International Symposium on Chinese Spoken Language Processing (ISCSLP), Tianjin, China, 17–20 October 2016; pp. 1–5. [Google Scholar]
- Malouf, R. Generating morphological paradigms with a recurrent neural network. San Diego Linguist. Pap.
**2016**, 6, 122–129. [Google Scholar] - Cardillo, F.A.; Ferro, M.; Marzi, C.; Pirrelli, V. Deep Learning of Inflection and the Cell-Filling Problem. Ital. J. Comput. Linguist.
**2018**, 4, 57–75. [Google Scholar] - Marzi, C.; Ferro, M.; Nahli, O.; Belik, P.; Bompolas, S.; Pirrelli, V. Evaluating inflectional complexity crosslinguistically: A processing perspective. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, 7–12 May 2018; pp. 3860–3866. [Google Scholar]
- Lyding, V.; Stemle, E.; Borghetti, C.; Brunello, M.; Castagnoli, S.; Dell’Orletta, F.; Dittmann, H.; Lenci, A.; Pirrelli, V. The paisá corpus of italian web texts. In Proceedings of the 9th Web as Corpus Workshop (WaC-9)@ EACL, Gothenburg, Sweden, 26 April 2014; pp. 36–43. [Google Scholar]
- Baayen, H.R.; Piepenbrock, P.; Gulikers, L. The CELEX Lexical Database; Linguistic Data Consortium, University of Pennsylvania: Philadelphia, PA, USA, 1995. [Google Scholar]
- Aronoff, M. Morphology by Itself: Stems and Inflectional Classes; The MIT Press: Cambridge, MA, USA, 1994. [Google Scholar]
- Bittner, D.; Dressler, W.U.; Kilani-Schoch, M. Development of Verb Inflection in First Language Acquisition: A Cross-Linguistic Perspective; Mouton de Gruyter: Berlin, Germany, 2003. [Google Scholar]
- Wu, S.; Cotterell, R.; O’Donnell, T.J. Morphological Irregularity Correlates with Frequency. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 5117–5126. [Google Scholar]
- Chen, Q.; Mirman, D. Competition and Cooperation Among Similar Representations: Toward a Unified Account of Facilitative and Inhibitory Effects of Lexical Neighbors. Psychol. Rev.
**2012**, 119, 417–430. [Google Scholar] [CrossRef] [PubMed][Green Version] - Balling, L.W.; Baayen, R.H. Morphological effects in auditory word recognition: Evidence from Danish. Lang. Cogn. Process.
**2008**, 23, 1156–11902. [Google Scholar] - Balling, L.W.; Baayen, R.H. Probability and surprisal in auditory comprehension of morphologically complex words. Cognition
**2012**, 125, 80–106. [Google Scholar] [CrossRef] [PubMed][Green Version]

Sample Availability: Datasets are available from the author. |

**Figure 1.**Functional architecture of the recurrent LSTM. Input vector dimensions are given in parentheses. Inputs are mapped on the projection layer z(t) that is input to the recurrent layer LSTM(t). Adapted from [12].

**Figure 2.**Functional architecture of a Temporal Self-Organizing Map (TSOM). As an example, map nodes show the activation pattern for the input string #pop$. Directed arcs stand for forward Hebbian connections between BMUs. “#” and “$” are respectively the start and the end of input words.

**Figure 3.**Regression plots of interaction between morphological (ir)regularity (Regulars versus Irregulars) and distance to morpheme boundary (MB) in non-linear models (GAMs) fitting the number of correctly anticipated symbols by the trained LSTMs on Italian (

**left plots**) and German (

**right plots**) for training and test sets.

**Figure 4.**Regression plots of interaction between morphological (ir)regularity (Regulars versus Irregulars) and distance to morpheme boundary (MB) in non-linear models (GAMs) fitting the number of correctly anticipated symbols by the trained TSOMs on Italian (

**left plots**) and German (

**right plots**) for training and test sets.

**Figure 5.**Regression plots of interaction between morphological (ir)regularity (Regulars versus Irregulars) and distance to morpheme boundary (MB) in non-linear models (GAMs) fitting the incremental number of correctly predicted symbols by the trained LSTMs on Italian (

**left plots**) and German (

**right plots**) for training and test sets.

**Figure 6.**Regression plots of interaction between morphological (ir)regularity (Regulars versus Irregulars) and distance to morpheme boundary (MB) in non-linear models (GAMs) fitting the incremental number of correctly predicted symbols by the trained TSOMs on Italian (

**left plots**) and German (

**right plots**) for training and test sets.

Language | Word Types | Regular/Irregular Paradigms | Number of Characters | Maximum Length of Forms |
---|---|---|---|---|

Italian | 750 | 23/27 | 22 | 14 |

German | 750 | 16/34 | 28 | 13 |

**Table 2.**Accuracy values for each language, neural network and training protocol. Standard deviations are given in parentheses.

LSTMs | Training | Test | TSOMs | Training | Test |
---|---|---|---|---|---|

Italian: 512-blocks | 93.55 (1.16) | 68.73 (5.54) | Italian: 42 × 42 nodes | 99.92 (0.13) | 95.62 (1.66) |

German: 256-blocks | 97.25 (0.65) | 74.54 (6.06) | German: 40 × 40 nodes | 99.88 (0.11) | 100 (0) |

© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Marzi, C.
Modeling Word Learning and Processing with Recurrent Neural Networks. *Information* **2020**, *11*, 320.
https://doi.org/10.3390/info11060320

**AMA Style**

Marzi C.
Modeling Word Learning and Processing with Recurrent Neural Networks. *Information*. 2020; 11(6):320.
https://doi.org/10.3390/info11060320

**Chicago/Turabian Style**

Marzi, Claudia.
2020. "Modeling Word Learning and Processing with Recurrent Neural Networks" *Information* 11, no. 6: 320.
https://doi.org/10.3390/info11060320