Toward a new scientific visualization for the language sciences. Information 2012, 3, 124–150. c○ 2012 by the authors; licensee MDPI

All scientists use data visualizations to discover patterns in their phenomena that may have otherwise gone unnoticed. Likewise, we also use scientific visualizations to help us describe our verbal theories and predict those data patterns. But scientific visualization may also constitute a hindrance to theory development when new data cannot be accommodated by the current dominant framework. Here we argue that the sciences of language are currently in an interim stage using an increasingly outdated scientific visualization borrowed from the box-and-arrow flow charts of the early days of engineering and computer science. The original (and not yet fully discarded) version of this obsolete model assumes that the language faculty is composed of autonomously organized levels of linguistic representation, which in turn are assumed to be modular, organized in rank order of dominance, and feed unidirectionally into one another in stage-like algorithmic procedures. We review relevant literature in psycholinguistics and language acquisition that cannot be accommodated by the received model. Both learning and processing of language in children and adults, at various putative ‘levels’ of representation, appear to be highly integrated and interdependent, and function simultaneously rather than sequentially. The fact that half of the field sees these findings as trivially true and the other half argues fiercely against them suggests to us that the sciences of language are on the brink of a paradigm shift. We submit a new scientific visualization for language, in which stacked levels of linguistic representation are replaced by trajectories in a multidimensional space. This is not a mere redescription. Processing language in the brain equates to traversing such a space in regions afforded by multiple probabilistic cues that simultaneously activate different linguistic representations. Much still needs to be done to convert this scientific visualization into actual implemented models, but at present it allows language scientists to envision new concepts and venues for research that may assist the field in transitioning to a new conceptualization, and provide a clear direction for the next decade.

space.This is not a mere redescription.Processing language in the brain equates to traversing such a space in regions afforded by multiple probabilistic cues that simultaneously activate different linguistic representations.Much still needs to be done to convert this scientific visualization into actual implemented models, but at present it allows language scientists to envision new concepts and venues for research that may assist the field in transitioning to a new conceptualization, and provide a clear direction for the next decade.Keywords: dynamical systems; language; language acquisition; psycholinguistics; sentence processing; scientific visualization -Thus, the ‗atomistic' attitude to words has been dropped and instead our point of view is rather similar to that of field theory in physics, in which ‗particles' are only convenient abstractions from the whole movement.Similarly, we may say that language is an undivided field of movement, involving sound, meaning, attention-calling, emotional and muscular reflexes, etc.‖ -David Bohm (Wholeness and the Implicate Order, 1980) -How dishonest I feel-as ‗expert' in atomic reality-whenever I draw for schoolchildren the popular planetary picture of the atom; it was known to be a lie even in their grandparents' day.‖-Nick Herbert (Quantum Reality, 1985)

Scientific Visualizations of Theoretical Frameworks
Recently data visualizations techniques have been developed to better understand complex empirical data.When one performs a data visualization technique, one often sees patterns in the data that would have gone unnoticed without that visualization.Likewise, in scientific enquiry more general, formal descriptions are often accompanied by scientific visualizations, i.e., pictorial descriptions that assist scientists in conceptualizing and communicating a theory [1,2].These visualizations are particularly useful when the scale of the phenomenon under scrutiny is orders of magnitude outside the human visible range (e.g., the atomic and subatomic structure of matter), or when the studied phenomenon comprises complex abstract relations and properties (e.g., how the human mind learns and processes language).As such, model visualizations often allow researchers to draw analogies and are constitutive of scientific endeavor because they can either hinder or promote paradigm shifts.Most would remember from their high school classes of physics how the concept of atom was conceived and visualized as an indivisible unit of matter (the ‗Solid Sphere' model, Figure 1A).If one looks at the history of how theories of the atom developed in physics, the Solid Sphere model was so deeply entrenched in physicists' minds that it persisted long after it became apparent that the atom was not so indivisible.In the late 19th century, Thomson's -Plum Pudding‖ model (Figure 1B) visualized the atom as a sphere of positive electricity with negative particles embedded throughout.This interim model was an attempt to shoe-horn new empirical data about the divisibility of the atom into the old conceptual framework of the solid sphere.Only with a radical change in visualization to a -Solar System‖ model by Rutherford (Figure 1C), in which the atom was mostly empty with a compact center, did the field transition to a truly new mathematical formulation and renewed progress in physics ensued.The new planetary conceptualization further prompted Bohr to abandon classical mechanical theory and make a theoretical leap to quantum theory, a model later refined by Schrödinger (Figure 1D; for a detailed history of the atom, see [3]).
In this article, we track a similar development of psychological theories of language, moving from solid encapsulated modules to a somewhat muddled interim model, and then to a complete reorganization of how to visualize and conceptualize language phenomena.We propose a new scientific visualization of the language system, consistent with a growing body of recent psycholinguistic research.We draw an analogy with the shift in scientific visualizations in physics to argue that researchers in the language sciences have been depending on an obsolete scientific visualization borrowed from box-and-arrow flow charts of the early days of engineering and computer science [4].In its extreme version, this model assumes a language faculty that is composed of autonomously organized levels of linguistic representation (i.e., phonetic, phonological, lexical, syntactic, semantic, and pragmatic).These levels are frequently assumed to be modular, organized in rank order of dominance, and to feed unidirectionally into one another in stage-based algorithmic procedures (Figure 1E).Contemporary versions of this stage-based modularity account of language have at times relaxed some of these constraints, and moved slightly in the direction of more fluid interaction between modules (e.g., [5][6][7][8][9][10]).However, we contend that these minor revisions to the box-and-arrow framework are reminiscent of Thomson's -Plum Pudding‖ model of the atom, in that they are best seen as an interim model that is gradually taking the field toward a radical reconceptualization.
A host of empirical findings in psycholinguistics [11][12][13][14] is revealing that the mind does not represent and process language serially, in modular and independent boxes, as suggested by the proverbial computer metaphor of the mind.Therefore, in this article, we outline the emerging new theoretical framework in which processing language in the brain, such as understanding a spoken sentence in real time, equates to traversing a multidimensional state space in regions constrained by multiple probabilistic cues (e.g., sublexical, lexical, semantic, syntactic, pragmatic, etc.) that simultaneously and continuously imbue partial activation to various linguistic representations.For instance, as we will detail, phonetic variation can influence a syntactic parse from the bottom up, and a pragmatic inference can alter the perception of an ambiguously heard word in a top-down fashion.This new scientific visualization of language comprehension as a trajectory through a single multi-dimensional space, where all information sources and their constraints are conjoined, may help the field let go of its tendency to cling to the original inspiration of boxes-and-arrows and their adjunctive, incremental revisions.
Multiple dimensions simultaneously contribute to the understanding of an utterance, or to bootstrap the child into its first linguistic productions.In fact, the mind may not even separate representations (e.g., symbols) from processing (e.g., rules applied on those symbols); see [15][16][17].Instead, the mind generally combines information sources (e.g., phonology, semantics, syntax, etc.) such that every substantive change in a neural activation pattern (which corresponds to a transition in state space) is impelled by emergent dynamical interactions among multiple types of cues, rather than by an individual command that is generated by a single rule-system or encapsulated module.However, before we flesh out a detailed description of this emerging visualization, the next section first discusses what the traditional visualization of language entails for a characterization of language processing.We then discuss how an interim model evolved during the 1990s to accommodate new empirical data.Finally, we argue that, like the Plum Pudding Model (Figure 1B) and its mere decade of popularity, this interim interactive box-and-arrow model (Figure 1F) should now be abandoned as largely inadequate.

Figure 1.
Visualizations and conceptualizations of the atom have changed dramatically in the history of physics.Physics transitioned from the early Solid Sphere Model (A) to an interim Plum Pudding Model (B) in which the discovery of subatomic particles was ‗shoe-horned' into the original conceptualization of the solid sphere, to the revolutionary Solar System Model (C) in which most of the atom is empty, to the probabilistic generalization of the solar system model, known as the Electron Cloud Model (D).Changes of visual conceptualizations in the history of linguistic inquiry are also altering our understanding of how the mind represents and processes language.The standard Box-and-Arrow Model of language (E), borrowed from flow chart descriptions in engineering and computer science in the 1950s, posited entirely independent modules with a feed-forward flow of information.The figure represents a model of language comprehension.For language production, the direction of the arrows is simply reversed, yet the model still treats the system as entirely unidirectional in information flow.The interim model (F) includes a tangle of additional arrows to accommodate recent evidence for feedback processes among the putative levels.Blue arrows indicate evidence from adult psycholinguistics; orange arrows indicate evidence from language acquisition.Analogous to the Plum Pudding Model, this interim visualization is as an attempt to ‗shoe-horn' the flow of multiple sources of information into the old tenet of encapsulated modularity.Dotted frames around the ‗modules' indicate how these have been progressively understood as less and less encapsulated than in the standard model, due to the highly interactive nature of language processing.The State-Space Model (G), and its statistical generalization (H), is a complete reorganization of how to visualize the way language works.If the various information sources in language are actually interdependent, probabilistic, and continuously integrated, then they should be conceived of as sharing a conjoined state space.For example, different utterances (i.e., trajectories) that use the same verb would all briefly visit that verb's general region in this conjoined linguistic space.In this figure, only three dimensions are depicted for ease of representation, with the full model understood as a very high-dimensional state space, whose axes are not necessarily orthogonal.In addition, each knowledge dimension may be composed of several sub-dimensions, each bringing a weighted contribution.For instance, prosodic, metric and phonetic information all contribute to phonological knowledge.The solid arrows in Figure 1F indicate feedback and feed-forward influences among linguistic dimensions as suggested in our review of empirical studies.

The Traditional Visualization of the Language System
In 1951, at a symposium at Cornell University, a group of psychologists and linguists including John Carroll, Charles Osgood and Thomas Sebeok gave birth to a new science and coined it -psycholinguistics‖ [18].Although their theoretical perspective stemmed predominantly from behaviorism, the field of psycholinguistics itself quickly came under the influence of the same burgeoning computing theory that was influencing the rest of cognitive psychology, with its engineering descriptions of components and flow-charts [19].Before long, logical formalisms from linguistics [20] and information-processing models from cognitive psychology [21] ushered into ascendancy the box-and-arrow model on which psycholinguistics textbooks continue to focus (Figure 1F).
Three fundamental assumptions underlie this traditional box-and-arrow model of language: (1) linguistic knowledge is encapsulated into discrete modular levels of analysis and representation, (2) stage-based, feed-forward processes regulate the flow of information between levels, (3) processes are rank-ordered, i.e., earlier processes take priority.
These three assumptions are captured in the box-and-arrow model of Figure 1E.Boxes correspond to the static and encapsulated levels, whereas arrows represent distinct processes operating serially on linguistic knowledge.Modularity posits that linguistic levels are informationally encapsulated, i.e., the workings of each level can be explained independently from any other level.The assumption of stage-based, feed-forward flow of information posits that the neural subsystems responsible for each level of representation wait until a stable unique representation has been computed before that information is passed on to the next processing stage.The rank order assumption further posits that a given level is dominant over others.
Numerous empirical results have been interpreted to support this box-and-arrow scientific visualization.We highlight two sets of findings that form a large part of the evidence brought to bear for feed-forward modularity in its heyday of the 1980's.The first claim grew out of experiments in syntactic ambiguity resolution.It was thought that the language processor computed a unique syntactic analysis of a sentence by default without any influence from semantic content or contextual plausibility.A sentence like -The horse raced past the barn fell‖ [22] contains a temporary syntactic ambiguity, which produces significantly slowed reading times.One interpretation of this phenomenon is that a reader or listener builds the syntactic structure consistent with the horse doing the racing (rather than being raced by someone) because of a syntactic preference for the structure with the fewest branching nodes [23] and this essentially leads comprehension ‗‗down a garden path'' that is inconsistent with subsequent input.Therefore, by the end of the sentence, the verb fell has nowhere to attach and thus cannot be grammatically integrated into the sentence, producing confusion and long reading times.Thus, the syntactic parse is obligatorily launched down a path because syntax is an independent module (see 1 above) [24,25]; this parse can only be ‗repaired' at the next clause boundary (if incorrect) because it is a linear feed-forward process (see 2) [26]; and contextual biases from semantics and pragmatics cannot prevent the garden-path effect because syntax has priority over them (see 3) [27,28].That is, much like syntax would appear to only use what it is given by the lexicon (it cannot make its own suggestions back to the lexicon), semantics would only initially use what it is given by syntax (it cannot make its own suggestions back to syntax).
The second claim supporting this feed-forward modular box-and-arrow framework for psycholinguistics resulted from experiments in lexical ambiguity resolution.For example, the word -rose‖ is ambiguous between being a noun and a past tense verb.Immediately after hearing -They all rose‖, participants show priming for both versions of the word, responding equally quickly to -stood‖ and to -flower‖, but a couple of hundred milliseconds later, the influence of syntactic context causes priming to be limited to the verb meaning, -stood‖ [29][30][31].It was argued that the lexical module is autonomous, and therefore there is a brief period of time during which only phonological input (no contextual information) can activate lexical entries in the mental lexicon.Somewhat similar limitations on context have been reported in studies of the reading of ambiguous words [32,33].
Due to findings like these, the feed-forward modular framework has generally set the psycholinguistic agenda for the past few decades.One could argue that the box-and-arrow scientific visualization (Figure 1E) has guided the search for empirical data to a much greater extent than the empirical data has guided the choice of that model.In fact, in 1987, Arthur Reber lamented this overwhelming a priori preference for model over data as spelling doom for the field [34].This situation began to change in the late 1980's and throughout the 90's.
As we briefly recount a few examples of the many dissenting findings from the past 20 years, one can observe that the deconstruction of the box-and-arrow model has proceeded in two steps.First, it has become evident that a much greater number of feed-forward and feed-back processes (arrows) were needed among the putative levels.Over the years, this has led to the gradual development of an interim model of highly-interactive processes, in which a semblance of seriality and modularity is preserved.The second stage of deconstruction chipped away at the very notions of modularity and seriality, arguing that the interactive processes are inherently continuous, not occurring in temporal stages, and that the representations themselves are not static objects but temporally dynamic events.The result is that both the boxes and the arrows have lost much of their descriptive power and psychological reality.Therefore, now is the time when a new scientific visualization of language is needed.

An Interim Scientific Visualization: Projections All the Way Up and Down
Recent psycholinguistic studies have indicated that processing language involves both a bottom-up and a top-down information flow.In the laboratory, this can be shown more eloquently by eliciting some form of temporary linguistic ambiguity, observing how the system settles into one among several potential interpretations of a given input and tracking the fine-grained time course of such interpretation.There is empirical evidence that top-down information flows at several levels of analysis.Lexical effects on speech perception are shown when a target word (e.g., /male/) is spotted earlier and more accurately when preceded by another word (e.g., /calculusmale/) than when preceded by a non-word (e.g., /baltulufmale/).This is the case even when bottom-up information from phonotactics facilitate the non-word condition (/fm/ is less frequent within words than /sm/) [35].Lexical information facilitates speech perception especially when sounds are ambiguous or degraded [36].There are top-down effects of syntax on speech perception: when phrases with alternative possible segmentation (/take spins/ or /takes pins/) follow a plural context (/those women takespins/), the syntactically congruent target (/spins/) is detected faster, even if pins is acoustically favored by bottom-up coarticulation [37].
Effects of semantics on speech perception are also documented: monitoring a target word (e.g., /gap/) is faster and more accurate when the preceding word is semantically related (/deepeninggap/) than when it is unrelated (/pseudonymgap/).This is the case even if phonotactic regularities favor the semantically unrelated condition (/mg/: a low frequency diphone in English words, as contrasted with the more frequent /ng/ [35].Higher-order information is also effective in speech disambiguation.Pragmatic influences on phonetic segmentation show that information inconsistent with acoustic cues causes listeners to modify their segmentation choice in the direction of the context [38].When presented with unsegmented near-homophonous phrases (/plum pie/ or /plump eye/) the target that was pragmatically congruent with a preceding appended phrase was chosen faster and more accurately.Therefore The baker looked at the drawing of a… favored a segmentation into plum pie, whereas The surgeon looked at the drawing of a… favored a segmentation into plump eye.Rohde & Ettlinger [39] embedded acoustically ambiguous pronouns (sounding between he and she) in sentence contexts that were either female-biasing (e.g., Abigail annoyed Bruce because Xe was in a bad mood) or male-biasing (Tyler deceived Naomi because Xe couldn't understand the situation).Their participants' acoustic judgments of the pronouns were influenced by the biasing contexts, suggesting that interactive processes emerge between the two most separate domains, phonetic perception and discourse-level pragmatics.In addition, time-course analyses suggested that the effect is present at the earliest stage of processing.
Pragmatic effects on semantic interpretation, or -pragmatic normalization‖, have been invoked to cover cases where a knowledge-based interpretation is given to sentences expressing unusual propositions [40].Situational context affects semantics.In the presence of a semantically ambiguous word, e.g., bulb, a strongly constraining context such as The gardener dug a hole.She inserted the bulb carefully into the soil primed only a contextually supported meaning (e.g., flower; [41], see also [42][43][44].Conversely, a weakly constraining context such as The scout patrolled the area.He reported the mine to the commander primed both senses (e.g., coal and explosive) of the ambiguous noun mine [45].Word recognition is also facilitated when real world contextual information is provided, suggesting that situational context affects lexical processing [46].Perhaps most famously, there are clear effects of semantics on syntax.The sentence The land mine buried in the sand exploded has exactly the same structure as The horse raced past the barn fell but crucially does not induce a syntactic garden-path effect.If syntax were an independent module, it should be equally difficult to process these two sentences.However, the semantic constraints imposed by the lexical items landmine and buried steer the reader away from the garden path, implicating a more interactive perspective on sentence processing [47].van Berkum, Brown, & Hagoort [48] obtained pragmatic effects on syntactic processing.

Boxes and Arrows in Language Acquisition
The box-and-arrow model implicitly underlies many of the assumptions about putative developmental stages of language acquisition.Here the question typically posed is -what source of information introduces the child to the language acquisition process?‖Several proposals have been put forth.Semantics cueing syntax: Young children may discover lexical categories by first noting semantic or referential information.For instance, people and objects are linked to nouns, actions are linked to verbs, and agents of actions are linked to subjects [49,50].Similarly, it has been proposed that syntax can cue semantics.Verb syntactic frames may help the child narrow down the meaning of verb structures [51,52].Other proposals have highlighted how higher-order information can be gleaned from low-level cues.There are phonetic cues to the lexical representations: within their first year of life infants become sensitive to language-specific cues such as coarticulation (in /aiskrim/ the diphone /sk/ is more overlapping in -I scream‖ while the diphone /is/ is more coarticulated in -ice cream‖), predominant stress patterns (in English stress-initial words, e.g., racket, are much more frequent than stress-final words, e.g., guitar; [53], and phonotactics (the segment /-ng/ never starts a word in English; for a review, see [54]).Other studies have shown that phonetics can cue syntax.Prosodic cues that are perceptually available in the first year of life (intonation and stress patterns, phoneme coarticulation) cue the child to discover the structural units of language, word and phrase structure boundaries [55][56][57].Sounds at the edge of words can reliably signal noun and verb categories [58].Noticing transitional probabilities between phonemes and syllables can indicate word boundaries in running speech [59] and distributional relationships among form-based cues can cue syntactic categories.For instance, the child could note co-occurrence relations between certain fragments (the_cat, a_car, and has_gone, to_play, but not the_gone, to_cat) or use distributional word endings regularities (work-s, work-ing, work-ed) to infer lexical categories [60].Social and situational factors impact perceptual learning: American infants learned a Chinese contrast not present in English only when they were immersed in live social interactions, but crucially not via TV or audio-only [61].

Limits of the Interim Model
The review above highlights how a strong interconnectivity exists between information sources in language at all putative levels, both in adult language processing and in language acquisition.In principle, a box-and-arrow model augmented with additional processing arrows could still be invoked as consistent with the above data.Under this -interim‖ paradigm, it may still be assumed that there are identifiably separate levels of representation in which information flows both -bottom-up‖ and -top-down‖.For example, a modular account of single word reading, the dual-route model (e.g., [62]), was updated to share some assumptions about cascaded processing and bi-directional information flow that are embodied in more neurally-inspired connectionist models (e.g., [63]).And the MERGE model of speech perception [64,65] accounts for higher-order word-level influence on phonemic decisions by asserting that while speech perception is feed-forward, phonemic decisions are made by a (later) decision-making mechanism that is sensitive to multiple levels of representation (but cf.[66]).Similarly, the variable-choice reanalysis model of syntactic processing accommodates immediate integration of multiple sources of contextual information, while still attempting to preserve a discrete stage-based selection among syntactic alternatives [67,68]; but cf.[69].
While interactive processing remains an area of intense debate, we identify here some important limitations that derive from thinking about language in terms of separate levels of representation and processing.We focus on three arguments from the following research areas: (1) language development; (2) continuous linguistic processing in adults; and (3) the neurocognitive architecture of the brain.
Let us consider language acquisition by children first.Thinking in terms of self-contained modular blocks of knowledge operated by stage-based processes implies that the child must first possess mature linguistic competence in at least one putative linguistic level, which would bootstrap the rest-such as lexical segmentation depending on fully-learned phonology.This assumption has led researchers into severe ‗chicken-and-egg' conundrums.For instance, the very existence of lexical categories (e.g., dog and dinner are nouns, while eat and run are verbs) presupposes knowledge of syntactic rules that apply to them, such as how to generate noun and verb phrases.However, to learn those syntactic rules, one must already have those lexical categories in place.In this and other types of chicken-and-egg paradoxes, the question that has been (ill-)posed is -what knowledge comes absolutely first in language development?‖In fact, both perceptual [70] and statistical learning abilities [71] of infants and toddlers develop gradually and simultaneously over time.Likewise, syntactic knowledge in toddlers such as verb argument structure [72] and pragmatic knowledge [73] seem far from adult competence.Several corpus and behavioral studies now suggest that children's early language tends to contain mostly phrases and utterance fragments that have been heard and used before.Very little substitution of lexical items or application of abstract syntactic patterns occurs in early language productions [74].
Since no level of representation can be independently mature at any stage to single-handedly kick start the learning process of the others, a more viable solution is that different representations co-develop gradually, mutually assisting each other via multi-directional correlations.This characterization of language development is inconsistent with the modular view of language knowledge as encapsulated into independent boxes.
Another related problem with the single bootstrapping ‗level' is that invariably no single cueing approach provides a perfect correlation with a given linguistic structure.For instance, distributional learning can fall prey to spurious correlations such as John eats meat, John eats slowly, The meat is good, which would erroneously lead the child to infer that The slowly is good should be grammatical [50,75].Likewise, prosodic cueing is only partially successful because phonological phrase boundaries do not map perfectly into syntactic phrase boundaries [76][77][78].Similarly, Pinker [79] discussed the limitations of syntactic information for discovering detailed semantic aspects of verbs (the syntactic bootstrapping argument).He pointed to the semantics that could be gleaned from each of the three following examples: Sew the shirt (indicated activity over an object), Sew me a shirt, (activity of creating an object for a beneficiary), Sew a shirt out of the rags (activity of transformation of a material into an object).Pinker's conclusion was that only a broad semantic inference of activity could be gleaned from the set of syntactic constructions in which the verb sew occur, but nothing like an enumerated list of semantic features.The semantic bootstrapping argument too seems to suffer from limitations, as many early nouns and verbs do not refer to things and events in the world, many words in English are both verbs and nouns, and all other lexical categories in the best cases display many-tomany mappings with the physical world (colors, as indicated by adjectives, can refer to quite disjointed sets of objects).Thus, all cueing approaches mentioned above would seem to fail if taken in isolation, and indeed this fact has been used as an argument to disqualify a given source of information as useful at all.However, these counter-arguments, instead of being used against each other, should be taken as suggesting that the putative ‗levels' of linguistic knowledge can only develop if they are conceived as part of a highly interactive system, i.e., if they are allowed to modify and to be modified at the same time, by developing several multi-directional projections and connectivities.We contend that one way to avoid the paradoxes and problems expounded above is to abandon the idea that there exist modular levels of representations.Thus, new research can begin examining how various sources of information contribute to the co-emergence of dimensions of representation [80].Since each dimension of knowledge emerges gradually, multiple dimensions can provide statistical support to one another in highly interactive ways, and perfect correlations are not necessary.
A second serious problem with the box-and-arrow model strikes at the very core of the modularity and seriality assumptions.It has been shown that the uptake of linguistic information in real time is continuous (not stage-like) and is simultaneously mediated by several sources of information.For example, as a spoken word unfolds over the course of a few hundred milliseconds, the continuous uptake of acoustic-phonetic information instigates partial activation of numerous lexical representations that share the same initial phonetic features [46,[81][82][83].This -cohort‖ of activated lexical representations quickly winnows as the latter portion of the word is finished.Thus, during the first hundred milliseconds of hearing the word -candy‖ being spoken, the brain partially activates mental representations for candy, candle, candid, candelabra, cannelloni, cancer, etc.During the second hundred milliseconds, a smaller set of just candy, candle, candid and candelabra is active.Finally, near the end of the word, only candy remains substantially activated.This means that the flow of information from the phonetics -level‖ to the lexical -level‖ is uninterrupted.Moreover, as feedback from the lexical -level‖ to the phonetics -level‖ appears to be similarly continuous [66,84], the motivation for treating these -levels‖ as though they are separate systems arranged in a meaningful sequence melts away, encouraging us instead to treat phonetics, the lexicon, and perhaps even semantic properties of words as co-existing in a unified high-dimensional state space [85].
Similar to spoken word recognition, the field of sentence processing has been gradually approaching consensus with regard to the rapid integration of cues from bottom-up and top-down processes.Just as the continuous uptake of the acoustic signal during spoken word recognition involves the integration of multiple cues at the time scale of dozens of milliseconds, so does the resolution of syntactic ambiguity involve the continuous integration of multiple cues at the time scale of hundreds of milliseconds [86][87][88][89].The research points to an account of real-time language comprehension that continuously integrates lexical, syntactic, semantic, discourse, visual, and even situational variables in real time.In light of this work, language comprehension no longer looks like the functioning of a rule-based flowchart, with subprocessors waiting until they finish constructing a symbolic representation before sending it to the next subprocessor.Partial, incomplete information (in the form of probabilistic biases) is shared continuously, and interpretation does not depend on waiting until a ‗syntactic module' has completed its autonomous parsing process.
A third argument in favor of seeing language as a fully connected multi-dimensional space comes from neuroscience and the architecture of the brain.Electrophysiological recordings show that distributed representations are widely used in the cortex [90].A distributed representation uses multiple active neuron-like processing units to encode information, and the same unit can participate in multiple representations.Units in distributed representations may represent single features (such as that a sound is plosive) or combinations of features.The implications for language are that it becomes hard to conceive of words as individual symbolic entities stored in separate mental drawers.For example, a word beginning with /sp/ activates partial neuronal patterns that are simultaneous representations of different words from different semantic categories (e.g., spice, spy, spoon), and syntactic classes (spying, spongiform, spite, spoiler).If phonetic activations are part-and-parcel of a word, it is hard to conceive of all the nouns being grouped in a separate lexical store without at least partially activating those adjectives and verbs that have overlapping phonetic features.
Further evidence from neuroscience in support of interactive processes is that bidirectional connectivity is ubiquitous in cortex [91,92], with communication of activation flowing simultaneously in both bottom-up and top-down directions.Furthermore, there is increasing neuropsychological and electrophysiological evidence that some brain regions often assumed to be language-specific (e.g., Broca's area) are implicated in processing of other non-linguistic cognitive processing such as music [93,94].Moreover, neural signatures of order violations in the learning of sequenced patterns are similar to those evoked by structural violations in natural language [95].Therefore, the localization of certain linguistic functions is coextensive with at least some other non-linguistic domains.These findings make informationally-encapsulated modular linguistic subsystems much less plausible.
We have argued that linguistic information involved in language processing and language learning appear to be highly integrated and interconnected, and its uptake is continuous at the time scales of language comprehension and of language acquisition.Given such a characterization, it appears increasingly inadequate to visualize language processing as a box-and-arrow flow-chart with processes (arrows) coming in and out of static encapsulated levels.The notion of a module only makes sense if there is a delay in the information flow such that a given subsystem is forced, during at least some short period of time, to make decisions without assistance from contextual information sources.If the information flow between putative modules is continuous, then there is no period of time during which a given subsystem is making decisions based purely on its own internal algorithms.Under such circumstances, referring to the subsystem as a -module‖ is no longer coherent.With so many incoming arrows (Figure 1F), and with their influence being continuous in time, any given box is clearly doing quite a bit more than its label implies.For example, if the syntax module's parsing decisions are being immediately influenced by biases from phonetics [96], semantics [97], and the situation model [98], such that those contextual biases are essentially functioning as part of the parsing algorithm itself, then is it really useful to still think of it as a syntax module?
Our impression is that the sciences of language may be at a historical moment similar to when physicists in the 19th century could no longer fit the findings of subatomic entities into the obsolete uncuttable Solid Sphere model (Figure 1A), or even into the modified version of that theory, the Plum Pudding Model (Figure 1B).They ultimately had to develop a completely new scientific visualization of atomic structure.Likewise, in the following section we spell out a new scientific visualization for language that is congruent with the continuous, interactive and overlapping uptake of linguistic information during real-time language processing and acquisition.

A New Scientific Visualization: The Multidimensional Space Model
Multidimensional spaces have been used in cognitive science to examine the graded similarity among cognitive categories of various types, including visual objects [99], concepts [100], semantics [101,102], phonetics [103,104] and syntax [105,106].Given the numerous findings recounted above of mutual interaction among linguistic information sources, it seems likely that what is needed is a single conjoined state space that combines phonetic cues (including intonational cues), visual cues (including facial expressions and gestures associated with speech [107,108]), distributional/syntactic cues, and other situational cues from the environment.In this high dimensional state space, there would be frequently traveled ruts corresponding to highly common forms of utterances, and there would be graded constraints (forces of attraction) that usually prevent certain types of trajectories corresponding to ungrammatical or nonsensical utterances [109,110].The wide range of state-space trajectories lying in between those hackneyed ruts and those infelicitous pathways constitute the creative nuances of productive sentence construction that [20] were brought to our attention fifty years ago.
When one switches from treating language as a linear series of complex logic gates to the new approach: a nonlinear dynamical system such as this, there are some important changes in a number of model properties.For example, capacity limitations take on a very different character in a continuous state space than cognitive scientists have grown accustomed to with their computer metaphor of the mind.With a multidimensional state space, temporary ambiguity in a time-dependent signal (such as a word or a sentence) is not implemented as several mutually exclusive symbolic alternatives that are held in working memory, where discretely enumerable capacity limitations could become a serious impediment.Rather, temporary ambiguity becomes nothing more than having the state of the system move into near-equal proximity to multiple attractors in state space.Whether this location in state-space is equidistant from 2, 3, or 19 alternative attractors, the concerns regarding a combinatorial explosion exceeding some capacity limitation do not threaten to overload the system's memory buffer.There is always only one set of coordinates being instantiated in the system, regardless of how multiply-ambiguous that location is with respect to nearby attractors.Thus, the state space framework will encounter resolvability problems when the branching alternatives become too numerous (cf. the -curse of dimensionality‖, [111]), essentially causing it to lose the ability to distinguish between some of those alternatives.However, in the state space framework, this abundance of ambiguous alternatives will never -crash the program‖ by violating memory constraints or be forced to arbitrarily disregard some alternatives because the buffer is full-which is a concern with rule-and-symbol systems performing incremental processing on input that contains temporary ambiguities [112,113].
There is another important difference between the theoretical framework of boxes and arrows and that of a trajectory through a multidimensional state space.This is that while the former must be viewed as a metaphor that guides research, the latter can be viewed as a concrete mathematical description of a real physical phenomenon.In Figure 2, the top row of panels depicts an idealized set of neurons that can have a range of spike rates, with grey being the baseline spike rate and red, orange and yellow indicating higher spike rates.The seven neurons on the left constitute a coherent group that responds to a particular stimulus (e.g., the neural ensemble that represents the word -candle‖).The partially overlapping group of cells on the right (see dashed circles in top right panel) constitute a coherent group that responds to a similar stimulus (e.g., the neural ensemble that represents the word -candy‖).From left to right, these panels show a change in activation pattern that corresponds to both neural ensembles (or -population codes‖) being slightly active, then more active, then the distribution of activation begins to shift toward the left-hand neural ensemble, and finally only the left-hand neural ensemble is significantly active.Idealized here as only a handful of neurons for simplicity, the second, third and fourth rows of Figure 2 all depict equivalent data visualizations of the same physical process whereby populations of neurons change their pattern of activation over time [114]; [12,115,116].That is, the dynamic patterns of activated concepts (third row), and the trajectory through an attractor landscape (fourth row), are not mere visual metaphors in the way that the box-and-arrow framework is.They can be taken as mathematical descriptions that quantitatively approximate concrete physical phenomena, and not as metaphorical qualitative abstractions of the physical phenomena.
Finally, if we step away from the theoretical perspectives that rely so much on static representational objects that are held in a memory buffer, and focus instead on the observation that a sentence's production and its comprehension take place over a period of time, we can better appreciate this alternative, more neurally plausible, account for how a spoken sentence is understood.While a person hears a sentence, it is obviously not the case that the brain is doing nothing until the sentence is finished and then it constructs a static mental representation.As the sentence unfolds over time, the listener's brain is continuously undergoing changes in its patterns of neural activation that are significantly driven by this environmental auditory input.If we describe these average firing rates over time as locations in a high dimensional state space, then the changes over time plot a continuous trajectory through that state space.Thus, the understanding of a sentence is here conceived of as an event in the mind, not an object [109,117,118].If some of the dimensions that one could identify in this state space are largely phonetic, and others are largely semantic, and still others largely syntactic, one can imagine collections of trajectory traces (from different periods of time) that bundle near a particular region of state space (which corresponds to a cluster of highly similar patterns of neural activation) -belonging‖ to particular words [105,106].See Figure 1.G.When a word is understood, it means that the listener's brain has briefly achieved a pattern of neural activation that maps roughly onto that region in state space.Hence, understanding a sentence involves having this moving average of neural firing rates change over time such that it corresponds to a continuous trajectory through the state space, smoothly traveling from one word's region to another and to another, spending at least as much time in between those attractors as inside them.These word regions then form clusters that act as fuzzy categories of Noun, Verb, Adjective, etc. [105], and within each word's region are sub-regions that correspond to situations in which that same word is being used as the Subject of the sentence or the Object or with even more subtle nuances [106].Thus, grammatical structure, linguistic productivity, and even systematicity can be achieved in this state space with trajectories that have nested loops on themselves and precise entry and exit points in each word's region [119,120].See also [121].In this framework, it becomes useful to conceive of words not as the operands on which linguistic algorithms operate, but instead as the operators themselves, whose directional proclivities in state space are what implement the linguistic algorithms [106].Two population codes (circled with dashed lines in the rightmost top panel) are mutually competing, and one will win.The figure depicts a theoretical mapping of the equivalence, over four 100ms time periods, among a neural population code's distributed activity pattern (top row, where red-orange-yellow means greater activity), its cell firing rates (second row), activation of the mental representations that correspond to the two population codes (third row) and the trajectory through state space that settles into one attractor (i.e., stable population code) rather than the other.The spike rates of these cells can be treated as coordinates in a 12-dimensional space, of which the bottom row depicts a 2-dimensional compression.During the first time period (column t1), the two partially overlapping groups of cells each have a few of their members somewhat active and spiking (responding to combinations of phonetics, semantics, syntax, etc.).In the third row, the two representations are slowly emerging during this initial time period (where the activation curves are starting to rise over time).In the fourth row, the state of the system is just beginning to move toward the two attractors (dashed circles) in state space.During the second time period (column t2), both population codes (i.e., representations) are continuing to increase in their respective activation patterns, and the trajectory is moving even closer to the two attractors.(Solid portions of a curve/trajectory indicate activity change during that time period, and dashed portions indicate the history of that activity change.)During the third time period (column t3), competition causes the left-hand group of cells to continue increasing its activity while the right-hand group decreases, thus the state-space trajectory begins to move toward one attractor and away from the other.During the fourth time period (column t4), one population code is nearly saturated while the other is nearly silent (dashed circles), its representation has risen to a high level of activation while the other is nearly inactive, and the trajectory has fully entered the perimeter of a particular attractor.
One possible objection to the proposed view of language processing as a large state space is that it puts no constraints on what regions of the space are equivalent for what purpose and no constraints on what the dimensionality of the space might be.Will each single utterance, in each different conversational situation, addressed to each different recipient, constitute a different trajectory in the state space?Is it not the scientific goal to understand the different types (not different tokens) of sentences or utterances or syntactic or semantic structures?Thus, this model might reasonably appear unsatisfactory because an arbitrarily large number of different states can be traversed, and the present version of the model fails to place boundaries about which trajectories are allowed in the state space and which are not.The attractiveness of the Rutherford model of the atom was not in its arguing that subatomic particles can inhabit any orbit.Rather, Planck's constant was used to provide constraints on where these orbits could be, how many could co-exist and which types of orbit-configurations had similar properties.
This ‗anything goes' objection to a dynamical systems approach to language is a fair criticism of the current state of the art, but it is not sufficient to cast doubt on the future of this research program.It seems clear from comparing Panels E and F of Figure 1 that the highly-constrained modular approach to language has required so many revisions over the years that it may simply be a misleading default framework to have as our standard.Instead of assuming massive restrictions and then having to relax nearly all of them in the model, perhaps better progress in the language sciences will be made by assuming almost no restrictions in the model and then adding the ones we empirically find over time.

Some New Concepts and Venues for the Language Sciences
We have started reviewing a re-conceptualization implied in a dynamical systems view of language processing.This re-conceptualization was anticipated back in the 1990s in the broader context of cognitive science [122], and enough empirical evidence has accumulated in support since then (partly summarized above) to make it a viable framework for the language sciences.This view promises to go far beyond reinterpreting existing psycholinguistic phenomena and issues, by leading to new alleys of research.In particular, it provides a set of tools for re-conceptualizing language learning.Four such concepts are likely to become center stage in the new science of language: (a) The role of multicausality and circular causality; (b) the notion of emergence structure; (c) the concept of nonlinearity; and (d) a greater attention to understanding individual differences in language.These concepts provide a unifying conceptual toolbox.In reviewing these concepts, we also briefly discuss how novel experimental and statistical techniques might be better suited for the tasks at hand.
Circular causality implies that no single element, source of information, or mechanism in the course of learning and processing language has causal priority independently.Although this may sound a truism intuitively, current widespread statistical methods do not incorporate these concepts.Two of the core assumptions behind (multiple) regression analysis, for example, are the independent contributions of predicting variables to predict a dependent variable.Notice that the decision of which is the dependent and independent variable rests heavily on the theoretical assumption of the experimenter.The current assumption built into most of our scientific efforts is thus one of imposed unidirectional causality, not of multicausality.Another fundamental assumption in regression is the absence of collinearity among predicting variables.A strong correlation between variables makes it difficult or impossible to estimate their individual regression coefficients reliably.In everyday data collection however, collinearity is extremely common.Dynamical systems theory turns collinearity into an asset, by acknowledging that multiple variables affect each other over time.In studying language, many processes may be heavily subjected to this principle.For example, speech perception abilities get reorganized as a result of exposure to the onset of reading.[123] showed that perception of native and nonnative speech contrasts can be respectively sharpened and attenuated as a result of (late and novel) experience with phoneme-to-grapheme conversion rules as a product of reading instruction.Children between 6 and 8 years who are good readers for their age show greater reduction of perception of nonnative speech contrasts, suggesting that speech perception in this period is affected by a newly (and relatively late) acquired skill.
Causal loops may be effective at different time scales, from the microlevel of combining several cues online for sentence comprehension, to the macrolevel of days, months or years of learning, and may have persistent global effects on the system.Recent studies of young children raised in bilingual contexts have observed bidirectional effects of one language on the other in many varieties of linguistic behavior, from pronunciation to morphosyntax, from grammaticality judgments to patterns of lexical storage, activation, and retrieval [124,125].The interpenetration of the two language systems makes it impossible for either the L1 or the L2 of a bilingual to be identical in all respects to the language of a monolingual.Grosjean notably made the case that a bilingual is not two monolinguals in one person [126].
Causal loops with interactive feedbacks also get established between interactants.For example, children whose mothers responded immediately to their infants' vocalizations produced more mature and adult-like vocalizations than when the social interaction was not contingent [127].In order to study two interactants as a unique dyadic system, new quantitative techniques may be required.Dale and Spivey [128] do so by applying recurrence quantification analysis (RQA) to the study of syntactic coordination between child and caregiver in a large corpus of child-parent interactions: -The method is based on analyzing sequences of syntactic elements, time series of grammatical usage, allowing comparison of two such sequences and revealing patterns of recurrence.The ordered sequences of concern here are time series of syntactic class usage by child and caregiver.The approach therefore provides a window on how structures used by the child -recur‖ in those used by caregiver (and vice versa)‖ ([128] p. 395).These and other studies suggest a strong role of social interaction in shaping the construction of phonemic and syntactic knowledge.Importantly, the same set of analytic and conceptual techniques based on dynamical analyses can now be applied to study both internal (cognitive) phenomena and social (communicative/interactive) dimensions, promising to bind together both internalist and externalist traditions in the social sciences in a new scientific visualization of language and communication (the theme of the Special Issue of this volume).
Emergence via multiple cues.Emergence in science refers to the making of new forms through ongoing processes intrinsic to the system.Emergent properties are often higher-order patterns from lower-level processes and their generation is not inbuilt a priori in the system.Think of soap bubbles; despite their differences in size and amount of airflow blown into, soap bubbles invariably ‗come out' in systematic shapes (e.g., spherical when in the air, hexagonal when packed next to each other in a single plane).Yet there is no ‗program' dictating the shape of bubbles.Rather, the particular shape emerges as an optimal self-organizing solution to keeping all particles on the surface of the bubble in an equal state of chemical ‗tension' in relation with neighboring particles.When ported to the study of language, the concept of emergence may have far-reaching consequences.The confluence of several low-level cues (e.g., phonetic) may give rise to higher-order structured representations (e.g., phonemes, syllables, morphemes, etc.).This suggests, for example, that infants' segmentation strategies across languages (syllable, time, or mora-based) may emerge as optimal solutions to probabilistic cues inherent in speech.Rather than assuming that pre-existing inbuilt switches are turned on early in infancy, such segmentation preferences may emerge as the best self-organized solution to the task of identifying words in running speech.The very notion of a phoneme as an abstract entity analogous to a written letter of the alphabet poses several challenges and has been questioned.The perception of speech relies on many probabilistically weighted acoustic cues, and integration of information across several acoustic dimensions is a central characteristic of auditory processing [36].For example, as many as 16 acoustic dimensions may characterize the perception of voicing contrasts, as in the difference between English /ba/ and /pa/ [129].Furthermore, the variety of actual pronunciations for any linguistic chunk that speakers may hear is seemingly unlimited and may vary along many continuous parameters, including individual speakers' voices, regional variations and subtleties of the social and pragmatic context.Since babies learn to recognize such a range of minute phonetic and temporal differences and can control many of them in their own speech as they grow, it is problematic to assume that linguistic memory could store only a canonical, abstract representation based on a minimal number of -distinctive‖ phonetic features in serial order.Thus, the comprehension and production of speech is a good example of how language does not appear to live in a low-dimensional phonological code.Speech decoding requires a far richer and more concrete representation of acoustic events necessary for storing concrete fragments and for the emergence of invariant linguistic properties (see [130]).
Furthermore, while lexicon and syntax have been traditionally treated as requiring separate mechanisms and underlying neural bases, their emergence in language acquisition appears to be tightly coupled by mutual, progressive loops [131,132].Here, emergence is closely tied to the concept of circular causality discussed above, because grammar is an emergent property of slow, incremental local abstractions over an increasingly larger repertoire of words and phrases.Another ambitious program for emergentism is to demonstrate that the existence of relatively small variations in the options available to all languages may result from optimal emerging solutions among the constraints on communication and biological constraints on processing sequences of sounds [133].For example, the widespread appearance of systems of 5 vowels (e.g., Latin, Italian, Spanish, Swahili, etc.) is possibly an optimal amount, given constraints on speech production and perception.Hence 5-vowel systems are widespread and stable over long periods of historical time.Similarly, constraints on sequential learning abilities may favor certain word orders over others [134].For a thorough discussion of where putative language universals may come from, see [135].
Nonlinearity.Small quantitative changes in one or more components of the language system or its input can lead to reorganization and large qualitative differences in behavior [136,137].Nonlinear behavior will help us explain strong individual differences in normally developing children, unaccounted for by the standard model, as well as departures from normal language abilities.For example, apparently subtle low-level phonological deficits [138,139] and/or processing deficits for sequential patterns and sounds [140] may have profound and detrimental cascading effects on higher-order morpho-syntactic development, resulting in syndromes such as Specific Language Impairments.
Furthermore, in a dynamical language system, multiple forces (biological, social, and maturational constraints) are likely to shape speakers' linguistic experience in non-linear and unique ways [141].Therefore, in contrast to more traditional approaches to language, there is likely to be growing interest in individual differences.Implicitly or explicitly, it is widely assumed that all native speakers uniformly possess the same core language competence, with variations in performance relegated to extralinguistic processing factors such as working memory capacity.However, considerable differences have been found in processing core aspects of syntactic knowledge.Dabrowska and Street [142] showed that participants with more schooling performed better than less educated participants in recognizing the agent (the ‗doer') of simple passive sentences.Less educated participants seemed to rely on simple processing heuristics rather than syntactic arguments.More surprisingly, the authors found that second language learners performed better than less educated native speakers in the same syntactic task.This superior result was attributed to possible effects of explicit grammatical instruction on the syntactic productivity of the non-native speakers (similar results were obtained for the comprehension of complex syntactic structures [143], and with native and non-native speakers of Japanese [144]).In a dynamical system then, it becomes important to investigate the specific trajectories of learning as they are shaped monthly or even weekly by specific exposure to environmental circumstances [145].In such a characterization, language learning ceases to be the inevitable cause of a single maturational program pre-wired in the brain.

Conclusions
We have presented a new visual and conceptual apparatus for interpreting and explaining language in the mind.The importance of such a change in how we visually depict our models of language should not be underestimated.Similar changes in scientific visualization were a driving force in early 20th century physics.During the emergence of quantum physics in the 1920's, scientific visualizations of the atom reveal that physicists were beginning to -worry less about what atoms are‖ and instead to -think more about what they do‖ [146].More recently, the growing field of scientific visualization is developing a wide range of techniques for identifying patterns in complex multi-dimensional data sets that lead to the formulation and testing of theories in a variety of disciplines [1,2,147].The scientific visualization offered by a dynamical-systems approach to language processing and acquisition encourages us to focus not so much on which locations in state space are visited, i.e., the representations that are constructed, but instead on how language travels through its state space, i.e., the continuous trajectories that are pursued.What does the future hold for the language sciences?We can envision two possible scenarios.In one scenario the two views of language expounded here are seen as competing explanations.As such, the ongoing advances of the newer perspective on language promise to assist the rest of cognitive science in a transition from its classical framework, in which symbolic representations were openly described as the indivisible atoms of the mind [148][149][150] to the more continuous and distributed framework of dynamical cognition [12,[15][16][17]100].In a second scenario, the two frameworks will coexist as complementary descriptions at different levels of abstraction; they will ask and answer different questions about the nature of language.Again, an equivalent precedent in the history of physics may be invoked, that being the time when physicists reached a consensus about the dual nature of light (as both a particle and a wave), because experimental evidence supports both theories.Whether the new language framework proposed here will supplant the standard one or whether the two frameworks will coexist is hard to tell.This paper offers a first step toward conceptualizing language in a novel way that arguably goes beyond several received assumptions of language, naturally accommodates a number of findings that are problematic for the traditional perspective and proposes some possible venues for novel research.

Figure 2 .
Figure 2.Two population codes (circled with dashed lines in the rightmost top panel) are mutually competing, and one will win.The figure depicts a theoretical mapping of the equivalence, over four 100ms time periods, among a neural population code's distributed activity pattern (top row, where red-orange-yellow means greater activity), its cell firing rates (second row), activation of the mental representations that correspond to the two population codes (third row) and the trajectory through state space that settles into one attractor (i.e., stable population code) rather than the other.The spike rates of these cells can be treated as coordinates in a 12-dimensional space, of which the bottom row depicts a 2-dimensional compression.During the first time period (column t1), the two partially overlapping groups of cells each have a few of their members somewhat active and spiking (responding to combinations of phonetics, semantics, syntax, etc.).In the third row, the two representations are slowly emerging during this initial time period (where the activation curves are starting to rise over time).In the fourth row, the state of the system is just beginning to move toward the two attractors (dashed circles) in state space.During the second time period (column t2), both population codes (i.e., representations) are continuing to increase in their respective activation patterns, and the trajectory is moving even closer to the two attractors.(Solid portions of a curve/trajectory indicate activity change during that time period, and dashed portions indicate the history of that activity change.)During the third time period (column t3), competition causes the left-hand group of cells to continue increasing its activity while the right-hand group decreases, thus the state-space trajectory begins to move toward one attractor and away from the other.During the fourth time period (column t4), one population code is nearly saturated while the other is nearly silent (dashed circles), its representation has risen to a high level of activation while the other is nearly inactive, and the trajectory has fully entered the perimeter of a particular attractor.