1. Introduction
Interest in iconicity is on the rise within cognitive sciences, psychology, and linguistics (
Nielsen and Dingemanse 2021). Although there is no link between the phonological structure and meaning of an arbitrary word (made meaningful by convention only), every natural language contains a portion of non-arbitrary elements that sound in spoken languages, or are articulated in sign languages, like what they mean (
Blasi et al. 2016;
Joo 2020;
Östling et al. 2018). It was shown that different words consisting of specific combinations of vowels and consonants, are non-arbitrarily associated with specific sensorimotor and emotional features (
Kawahara 2020). An iconic form-meaning relationship is perceived directly both visually and aurally from the physical form of a word. This resemblance-based mapping between form and meaning is caused by synaesthesia (
Ramachandran and Hubbard 2001) and cross-modality (
Sidhu and Pexman 2018a). Iconicity is generally considered a universal feature of a language: words with at least some degree of iconicity are registered in all languages across the world (
Akita and Pardeshi 2019;
Voeltz and Kilian-Hatz 2001).
The share of imitative words in the lexicon of a language depends on a particular scholar’s view on iconicity—whether all the words with historically iconic roots should be considered or just modern vivid interjections like
brr or
oof. The fact that languages develop over time suggests that iconic words also continuously appear, and then, over the course of their development, lose their iconic traits (
Flaksman 2017). Nevertheless, unlike many non-Indo-European languages, which are rich in iconic words, Indo-European languages (i.e., English and Russian) are generally seen as relatively arbitrary (
Imai and Kita 2014) (this comparison is based on the number of imitative words and their share in respective lexicons—for the discussion—see (
Bartens 2000, pp. 41–43;
Hinton et al. 1994, introduction)). In terms of morphology, for example, the comparative study of phonetic iconicity in Slavic, Germanic, Romance, and Finno-Ugric languages revealed no language-general tendency for the dominant occurrence of front high vowels and front consonants in diminutive affixes and high back vowels and back consonants in augmentative ones (
Stekauer et al. 2009).
Traditionally, researchers interested in iconicity have focused on its audial perception, since iconic words are regarded more as a part of spoken language—often accompanied by gestures—than a part of written language (
Voeltz and Kilian-Hatz 2001). Although the written sign system of a language is basically conventional, iconic items do occur there. This is especially true for the languages with alphabetic writing systems where sound sequences are coded with corresponding letter sequences. Moreover, they are essential for literary texts: “The iconicity of linguistic sound, which plays a minor part in language as a system, plays a leading character in literature, especially in poetry” (
Johansen 1993, p. 227). The question raised in the present research is whether or not the users of two Indo-European languages from different groups, Germanic and Slavic, and of different morphological types, analytic (English) and synthetic (Russian), are sensitive to spoken word iconicity represented in writing and, if so, to what extent.
To date, there have been only a few experiments addressing the visual perception of spoken iconicity in these languages. For instance, a psycholinguistic study by
Monaghan and Fletcher (
2019) indicated that for the speakers of English, the visual processing of individual phoneme-meaning correlations (the relations between the manner of articulation and the meaning) in non-words was more important than cross-modal associations. To the best of our knowledge, there is no study on the investigation of the visual recognition of English iconic words by native English speakers in a lexical decision task. The study by
Sidhu et al. (
2020) examined the recognition of English iconic words in a visual lexical decision task by undergraduate students with a fluent command of English. The findings revealed faster and more accurate responses to words higher in iconicity. Studies on the investigation of Russian iconicity are scarce. The experimental study by
Tkacheva et al. (
2019) on the investigation of the visual perception of Russian and English iconic words by Russian adults (N = 148) has found that iconic words were identified more slowly and with a greater number of errors than non-iconic words. The findings are explained by the cognitive complexity of recognizing iconic words, which contain both semantic and figurative information. The linguistic study by
Flaksman (
2020) has shown that the less iconic a Russian word is, the more it is morphologically marked, which is typical of a synthetic language.
It is noteworthy that most psycholinguistic studies regard iconicity as a discrete property that is either present or absent (
Dingemanse et al. 2020). Furthermore, the degree of iconicity is established not by means of (historical) linguistics but measured over the course of an experiment.
Perry et al. (
2015) hypothesized that words belonging to different grammatical categories exhibit different degrees of iconicity across languages. The results of their experiments showed that English native speakers rated visually presented onomatopoeic words (i.e., moo) and interjections (i.e., ouch) as most iconic, while nouns (i.e., jeans) and grammatical words (i.e., here) were least iconic. The authors also claimed that the words sounding like what they mean are the earliest words learned by both L1 and L2 learners of English.
Winter et al. (
2017) replicated these findings using native speaker ratings of the degree of iconicity of English words (N = 3001). The results showed that English sensory words, particularly those related to sound and touch, are more iconic than abstract ones. Altogether, their results proved that English sensory words are more strongly related to iconicity than to systematicity.
The current study builds on previous research by taking into consideration the concept of the de-iconization stage of iconic words, i.e., the degree of iconic quality and form-meaning resemblance (
Flaksman 2015). The criteria for the classification of imitative words according to de-iconization stages are: (1) morphological and syntactic integration; (2) influence of regular sound changes; and (3) influence of semantic shifts (ibid.). The application of these criteria allows distinguishing words on four de-iconization stages, which co-exist simultaneously in the language. Criterion (1) separates iconic interjections and ideophones from the syntactically and morphologically integrated imitative nouns, verbs, etc. The latter can be further sub-divided into words either with or without form and/or meaning changes (
Flaksman 2015, pp. 128–40):
Words at SD-1 are the most vivid, imitative words, mainly interjections that have not changed their form or meaning (e.g., Eng. grr! (int.)—‘a sign of anger or annoyance’; Rus. tpruu! (int.)—‘an exclamation, predominantly applied to horses as a command to stop’).
Words at SD-2 are content words that retain their original sound-related meaning, having not yet undergone any regular sound changes (e.g., Eng. bleep (N)—‘a short high-pitched sound made by an electronic device as a signal’; Rus: gul (N)—‘a continuous low noise’).
Words at SD-3 have lost either their original form through significant sound changes or their original (sound-related) meaning through semantic shifts (e.g., Eng. bib (N)—‘a piece of clothing fastened round a child’s neck to keep the clothes clean while eating’, probably from Latin bibere ‘to drink’; Rus: mops (N)—‘a pug’, a borrowing from Dutch or German Mops, from the verb with the original meaning ‘to look sulky, to pout’, cf. Dutsch moppen (1678) ‘to mutter, mumble, sulk, pull a face’ and English mope ‘to make a grimace, to make faces’, perhaps imitative of movements of the lips, according to OED).
Words at SD-4 have fully lost their imitative quality and remain iconic by origin only (e.g., Eng.
craze (N)—‘an enthusiasm for a particular activity’, late ME in the sense ‘break, produce cracks’; Rus:
klok (N)—‘a tuft’, originally a sound-symbolic word, according to
Koleva-Zlateva (
2008, p. 162)).
Flaksman (
2015, pp. 128–31) emphasized that SD-1 words differ from the rest due to numerous reasons: this de-iconization stage is optional, as there is evidence for many imitative words being coined as SD-2 words already; they exhibit phonetic hyper-variation and instability of form (expressive ablaut, gemination, expressive vowel lengthening, metathesis); they can be reduplicated (partially or totally) without a change in meaning; and they are phonosemantically inert (
Flaksman 2013), that is they are not affected by regular sound changes (unless they have proceeded to SD-2). Furthermore, although there is a significant body of research exploring visual word recognition, there is still a continuing debate as to how lexical access proceeds. Previous research (
Sidhu et al. 2020;
Aryani et al. 2019) has found a facilitatory effect of iconicity on the lexical processing of visually presented stimuli. The question is which additional factors can affect the visual recognition of iconic words besides iconicity itself. Obviously, interfering factors may include the frequency of words (
Winter et al. 2017) and their neighborhood density (
Sidhu and Pexman 2018b). Generally, in English, 71.5% of all words are monosyllabic (
Gitt 2006), which results in a higher neighborhood density for monosyllabic experimental and non-word stimuli as compared with Russian. It may be an important contributing factor to the speed and accuracy of word recognition. Also, previously it was shown that different words, consisting of specific combinations of vowels and consonants, are non-arbitrarily associated with specific sensorimotor and emotional features (
Kawahara 2020), i.e., that different kinds of non-arbitrary mappings can affect the process of recognition when appearing in the same stimulus (
Sidhu et al. 2020). Furthermore, sound-symbolic phenomena may occur also in non-words due to specific combinations of letters in them (
Sidhu and Pexman 2019). Moreover, errors, delays, and inaccuracies of recognition are typical for words, which do not behave according to the sound structure of the target language (
Styles and Gawne 2017).
There are also psycholinguistic features of target word stimuli to consider, such as familiarity, imageability, emotional valence, and emotional arousal, which can affect the process of visual recognition when filled with an individual meaning for the participant (
Citron et al. 2014). It is known that conceptual recognition relies on the perceptual system, and, sometimes, perceptual–conceptual interference occurs where perceptual stimulation in a particular sensory modality leads to slower or less accurate recognizing of information from the same modality (
Vermeulen et al. 2008); however, there is the reverse process, called perceptual–conceptual facilitation, wherein perceptual stimulation leads to faster and more accurate recognition (
Connell and Lynott 2012b). Thus, it was shown that words referring to concepts with a strong visual component are recognized faster and more accurately in a lexical decision task in comparison with non-visual words, which have a similar length and frequency (
Connell and Lynott 2012a).
We propose an alternative framework to study sensitivity to visually presented iconicity in two typologically different Indo-European languages—English (analytic) and Russian (synthetic)—based on the division of iconic words according to four de-iconization stages—an additional explanatory variable that was neglected in previous research. We aim to learn how English and Russian participants recognize visually presented native iconic words in comparison with arbitrary words and non-words. We attempt to identify which factors may affect iconic word recognition in each language, given that the lexical decision time is a dependent variable. The independent variables are word frequency, neighborhood density, type of stimuli (non-words, arbitrary words, iconic words belonging to four de-iconization stages), and the individual differences of the subjects. Based on the results of the previous studies, we posit the following hypotheses:
Hypothesis 1 (H1). The most explicit iconic words are recognized more slowly and less accurately than the other content words by native speakers of Russian.
Hypothesis 2 (H2). Speakers of each language are equally sensitive to iconicity and show similar recognition pattern whilst perceiving explicit iconic words (SD-1, SD-2).
4. Discussion
First of all, it is worth emphasizing that our independent variables taken into account in the analysis had different effects on the results. Neighborhood density had no statistically significant effect on the results, while accounting for the frequency of the words allowed us to take into account a significant proportion of the variance of the dependent variable, thereby making the effect of distinguishing between stimuli more pronounced. We hypothesized that the most explicit iconic Russian words would be recognized more slowly and less accurately than the other content words by native speakers of Russian (H1). According to our second hypothesis (H2) we expected similar recognition patterns in both Russian and English speakers concerning explicit iconic words (SD-1 and SD-2) even though the results of previous studies suggested that English iconic words are recognized faster (
Sidhu et al. 2020), and we assumed that a degree of iconicity plays a crucial role and iconicity itself is a general language feature. It appeared that indeed the most explicit iconic Russian words (SD-1) are recognized slower than all other words, and less accurately as well. However, Russian words belonging to SD-2 are recognized faster and the most accurately in comparison with all other words. As for the words of SD-3, they are recognized faster and with more accuracy corresponding to arbitrary words. SD-4 words are recognized similarly to arbitrary words within the parameters of speed and accuracy. Given this, our first hypothesis is confirmed. We can state that native speakers of Russian show a high sensitivity to the most explicit iconic words. However, less explicit iconic words (SD-2, SD-3, SD-4) differ significantly in the patterns of their recognition from SD-1 and the tendency is, the lesser iconicity, the closer they are to arbitrary words in terms of the speed and accuracy of their recognition. We suppose that SD-1 words are recognized slowly and less accurately due to the cognitive complexity of the task connected with the necessity for decoding the information on both lexical and figural levels. It is likely that the most explicit iconic words activate the cross-modal interaction, which happens when linguistic stimuli contain multiple acoustic and articulatory features and thus the process of recognition of such stimuli involves the integration of inputs across modalities. Given this,
Parise (
2016) in his review compares sound symbolic associations and cross-modal correspondences, thereby supposing that they are related to each other. In our case, all SD-1 Russian words are directly associated with movements that distinguish them from other stimuli, so presumably their perception includes visual-motor cross-modal interactions and this process is reflected in the time delay of LDT and the inaccuracies of their recognition.
However, for English stimuli, the different results are obtained. It appears that, paradoxically, SD-1 words are recognized faster and more accurately than arbitrary words and words belonging to SD-2 and SD-4. At the same time, SD-2 words are recognized the most slowly and the least accurately than all other words. SD-3 words are recognized the fastest and as accurately as SD-1 words. As well, SD-4 words are recognized similarly to arbitrary words within the parameters of speed and accuracy, which eventually coincides in the recognition of patterns with the same type of Russian stimuli (SD-4). Given this, and concerning the second hypothesis, our results showed that the differences in the recognition of iconic words in Russian and English are rather language-specific and there is no similarity in the recognition patterns except for words belonging to the SD-4 group. It is noteworthy that our results concerning the most explicit iconic English words (SD-1) go along with the results of previous studies in which it was shown that iconic words are recognized faster (
Sidhu et al. 2020). It should be noted that English words of SD-1 are very different from Russian words of SD-1, since they are related to the emotional component, while Russian words, as mentioned above, are associated with movement. It is likely that there are various neurocognitive mechanisms behind the perception of these words, which may cause such a difference in the speed and accuracy of LDT. Concerning SD-2 English words, which are recognized the most slowly and the least accurately—even less accurately than non-words—the results could be attributable to their grammatical ambiguity. English words like
buzz or
puff can function as (1) interjections, (2) nouns, and (3) verbs. For example,
puff is (1) an interjection representing ‘the act of blowing a puff of air, smoke’, (2) a noun referring to ‘the action of puffing’, and (3) a verb with the meaning ‘to blow out (air, one’s breath, smoke, etc.) (OED). This structural ambiguity might have caused hesitation in categorizing these stimuli as ‘words’ or ‘non-words’. Moreover, the SD-2 group is characterized by the lowest accuracy rate (63.7), both among the English and Russian stimuli.
As for the group of non-words, it was used to balance word decisions with non-word decisions. The non-words were constructed according to the phonotactic constraints of the English and Russian languages, correspondingly. Moreover, they contained legal letter strings so that they were pronounced as regular English/Russian words. Rather, the results showed the general language tendency to the more accurate and slower recognition of non-words in comparison with the iconic stimuli taken together.
Overall, the results show different recognitions of English and Russian iconic words at different stages of de-iconization, except for SD-4 words. The English participants recognized the words at SD-1, SD-3, and SD-4 at roughly the same speed. However, we observe a different pattern in the responses of the Russian subjects: the RT decreased starting with non-words through SD-1 to SD-3 group, i.e., the more de-iconized a word is, the faster it is identified. Surprisingly, however, there was a sudden increase in RT to the most de-iconized words at SD-4 as presented in
Figure 2. Presumably, this could be attributed to the fact that this group included three borrowed words (
lunch,
putsch, and
puff) that were recognized slower and with a greater number of mistakes than the other words, which might have affected the speed and accuracy of the word recognition of the whole group. The overall RA to all stimuli groups was rather high for both participants groups: 80% in the responses of the English participants; for the Russian participants, the RA was higher for non-words and the SD-2 group (ca. 90%); and for the other stimuli groups, the mean accuracy was 80%. This might serve as an indication that, along with orthographic and phonological representation, the participants activated the semantic code.
Recent research (
Sidhu et al. 2020;
Aryani et al. 2019) has provided evidence for the facilitative role of iconicity in language processing. Since iconic words possess a more direct link between phonology and semantics, their processing is argued to be faster. This could be one of the factors that influenced the responses of the English participants. Despite the fact that the words at SD-1 are difficult to categorize into one of the major word classes, they were recognized as quickly as SD-3 and SD-4 stimuli, and even faster than SD-2 words. However, here arises the question as to why the Russian native speakers did not demonstrate the same pattern with regard to this group of stimuli. A possible speculative explanation can be proposed based on the frequency of SD-1 words in written texts. Also, it should be noted that in English, unlike Russian, there is a large number of semantically ambiguous words (i.e., words with more than one meaning) that are processed and represented differently in the human mind (
Eddington and Tokowicz 2015).
It is important to mention that most research on iconicity rationally focuses on auditory perception, since iconic words, at least explicit iconic words at SD-1, are mostly used in informal everyday speech. The current study is one of the few studies to provide insight into the visual perception of iconic words. Although we have controlled for the word frequency of the stimuli, it would be advantageous to control for the word frequency in written contemporary texts, i.e., words that appear the most in printed language are easier to recognize than words that appear less frequently (
Grainger 1990). The study by
Kalman and Gergle (
2014) has revealed the relatively high written frequency of interjections (which correspond to SD-1 words in our study) in computer-mediated communication (CMC) in English. They explored letter repetitions as a CMC cue from a collection of emails. Their findings revealed that many of the repetitions were onomatopoeic words (e.g.,
Boommm!
Craaash!
Ooops!), which were used to imitate spoken non-verbal cues. Another study, which focused on the subtitles of British television programs, revealed that they often contain “typos and other non-word-like structures (like
aaaarrrrgh or
zzzzzzzzzzzz)” (
Van Heuven et al. 2014). Such examples are representatives of ‘new’ imitative SD-1 words, which shows that such rare lexis is visually presented in subtitles quite frequently. In these words, letter repetition indicates gemination (consonant lengthening) or vowel lengthening, both of which are typical traits of ideophones (
Voeltz and Kilian-Hatz 2001) and SD-1 onomatopoeic interjections (
Flaksman 2015). We assume that we might observe these frequency effects in the performance of the English participants. To our knowledge, there are no similar studies in Russian therefore it is problematic to draw conclusions in this respect. Word prevalence (i.e., the number of people who know the word) might be an additional factor to explain the fact that some low-frequency words are recognized as quickly and accurately as high-frequency words are (
Brysbaert et al. 2019). These observations may account for the significant delay in the recognition of words at SD-1 in Russian.
Last, but not least, we should take into consideration the differences between individuals in terms of their cognitive abilities, for example, in the degree of semantic reliance. There are also the psycholinguistic features of target word stimuli to consider such as familiarity, imageability, emotional valence, and emotional arousal, which can affect the process of visual recognition when filled with an individual meaning for the participant (
Citron et al. 2014). The task in the current study was to answer as quickly and accurately as possible, emphasizing speed over accuracy. It means that subjects should rather rely on orthographic code, which results in faster responses to words with more neighbors (
Binder et al. 2003). However, if some of the subjects chose to put greater emphasis on accuracy (in which case we observe a reverse picture: responses to words with few or no neighbors are faster), a somewhat different explanation for the response patterns could be provided. If this was the case with our English participants, for example, this could partially account for their response pattern (the SD-2 group was the lowest to recognize among all SD-groups, with the mean number of neighbors of 2.1).