Proximity Loses: Real-Time Resolution of Ambiguous Wh-Questions in Japanese

Nakamura, Chie; Flynn, Suzanne; Miyamoto, Yoichi; Yusa, Noriaki

doi:10.3390/languages10120288

Open AccessArticle

Proximity Loses: Real-Time Resolution of Ambiguous Wh-Questions in Japanese

by

Chie Nakamura

^1,*,

Suzanne Flynn

²,

Yoichi Miyamoto

³

and

Noriaki Yusa

⁴

¹

School of International Liberal Studies, Waseda University, Tokyo 169-0051, Japan

²

Department of Linguistics and Philosophy, Massachusetts Institute of Technology, Cambridge, MA 02139, USA

³

Graduate School of Humanities, The University of Osaka, Osaka 560-0043, Japan

⁴

Department of English Faculty of Liberal Arts, Miyagi Gakuin Women’s University, Miyagi 981-0961, Japan

^*

Author to whom correspondence should be addressed.

Languages 2025, 10(12), 288; https://doi.org/10.3390/languages10120288

Submission received: 12 August 2025 / Revised: 12 November 2025 / Accepted: 12 November 2025 / Published: 26 November 2025

Download

Browse Figures

Review Reports Versions Notes

Abstract

This study investigated how Japanese speakers interpret structurally ambiguous wh-questions, testing whether filler–gap resolution is guided by syntactic resolution based on hierarchical structure or linear locality based on surface word order. We combined behavioral key-press responses with fine-grained eye-tracking data and applied cluster-based permutation analysis to capture the moment-by-moment time course of syntactic interpretation as sentences were processed in real time. Key-press responses revealed a preference for resolving the dependency at the main clause (MC) gap position. Eye-tracking data showed early predictive fixations to the MC picture, followed by shifts to the embedded clause (EC) picture as the embedded event was described. These shifts occurred prior to the appearance of syntactic cues that signal the presence of an EC structure, such as the complementizer -to, and were therefore most likely guided by referential alignment with the linguistic input rather than by syntactic reanalysis. A subsequent return of the gaze to the MC picture occurred when the clause-final question particle -ka became available, confirming the interrogative use of the wh-phrase. Both key-press and eye-tracking data showed that participants did not commit to the first grammatically available EC interpretation but instead waited until clause-final particle information confirmed the interrogative use of the wh-phrase, ultimately favoring the MC interpretation. This pattern supports the view that filler–gap resolution is guided by structural locality rather than linear locality. By using high-resolution temporal data and statistically robust analytic techniques, this study demonstrates that Japanese comprehenders engage in predictive yet structurally cautious parsing. These findings challenge earlier claims that filler–gap resolution in Japanese is primarily driven by linear locality and instead showed a preference for resolving dependencies at the structurally higher MC position, consistent with parsing biases previously observed in English, despite typological differences in word order between the two languages. This preference also reflects sensitivity to language-specific morpho-syntactic cues in Japanese, such as clause-final particles.

Keywords:

filler–gap dependency; wh-question interpretation; Japanese sentence processing; structural locality; linear locality; eye-tracking; permutation analysis; predictive parsing

1. Introduction

A “filler–gap” dependency occurs when a wh-phrase or another nominal element (the filler) is displaced from its canonical position in a sentence. In turn, this displaced nominal element must be linked to another position in the sentence (the gap) in order to be properly interpreted. At the gap site, the filler fulfills its grammatical role as required by the verb or another predicate, such as serving as the object of the verb. For instance, in (1), the filler who is interpreted as the object of saw, even though it appears at the beginning of the sentence and is separated from its original canonical position by an embedded clause. Such dependencies that span across a clause boundary are referred to as long-distance dependencies.

(1): Who did Mary say that John saw __ at the party?

A central question in psycholinguistic research is when and how comprehenders resolve these long-distance dependencies during real-time sentence processing. In the types of structures most commonly investigated, namely wh-questions and relative clauses, the filler typically appears at the beginning of the sentence, while the gap is encountered later in the linear sequence. This raises an important question: Do comprehenders attempt to link the filler to a gap as early as possible during parsing of the sentence structure, adopting what has been termed an active resolution strategy, or do they instead wait until additional structural evidence becomes available, following a wait-and-see strategy? This issue has motivated the development of several distinct theoretical accounts of sentence processing.

One influential account is the Active Filler Strategy (Frazier, 1987), which proposes that upon encountering a displaced filler, such as a wh-word, comprehenders actively search for the earliest grammatically permissible gap site to interpret the filler. In other words, they attempt to identify a gap where the filler can be assigned its syntactic role (e.g., subject or object) as soon as the parser encounters a position compatible with that role, without waiting for further confirmation. This assumption is supported by studies on the Active Gap-Filling Hypothesis (Crain & Fodor, 1985; Stowe, 1986), which indicate that listeners attempt to link a wh-phrase to a gap as soon as a grammatically licit position becomes available. Processing difficulty arises when the predicted gap site is unexpectedly filled by an overt noun phrase. For example, in (2), who is initially expected to be the object of bring, but this expectation is violated when the direct object us appears, forcing comprehenders to revise their initial interpretation.

(2): My brother wanted to know who Ruth will bring us home to __ after the party.

Much of the empirical support for the Active Filler Strategy derives primarily from studies conducted in head-initial languages like English. In these languages, comprehenders often show an early preference for main clause (MC) interpretations when processing structurally ambiguous wh-questions. For example, in (3), the wh-word where can be interpreted as referring either to the location of the “telling” event (i.e., MC interpretation) or to the location of the butterfly-catching event (i.e., embedded clause (EC) interpretation). Using methodologies such as self-paced reading and eye-tracking, previous studies have shown that English speakers typically favor the MC interpretation, consistent with a bias toward early gap resolution (Traxler & Pickering, 1996). Similar findings have been reported in other head-initial languages such as German (Gibson, 1998).

(3): Where did Lizzie tell someone that she was going to catch butterflies?

Although previous findings support the view that comprehenders resolve wh-dependencies as early as possible, a critical question remains: What guides this early commitment in filler–gap dependency resolution? Two possible explanations have been proposed, one based on hierarchical syntactic structure and the other on linear locality constraints. The first account suggests that comprehenders prioritize syntactically more prominent positions, resolving dependencies in higher structural positions, such as the matrix clause, whenever possible. For example, in a sentence like (3), comprehenders may interpret the wh-phrase where as referring to the location of the matrix verb tell, rather than the embedded verb catch. This preference for MC interpretations aligns with parsing theories that emphasize structural simplicity and grammatical prominence (Frazier & Fodor, 1978; Frazier, 1987). According to this view, comprehenders favor interpretations that involve fewer levels of clausal embedding because these require fewer processing resources and reflect the syntactic hierarchy of the sentence. In this framework, the matrix clause, being structurally higher, serves as a default attachment site for the filler. Thus, when comprehenders encounter a wh-phrase, they initially attempt to resolve the dependency within the main clause before considering more deeply embedded clauses.

In contrast, the locality-based account argues that the dependency resolution is guided by linear proximity. Comprehenders are assumed to favor gap positions that are closer to the wh-phrase in the surface order, thereby minimizing the working memory demands associated with maintaining an unresolved dependency. A prominent formalization of this idea is the Dependency Length Minimization Hypothesis (Gibson, 1998), which posits that dependencies involving shorter linear distances between syntactically related elements are easier to process than those involving longer distances (Gibson, 2000; Lewis & Vasishth, 2005). According to the Dependency Locality Theory, even when dependencies are not structurally simpler in terms of hierarchical embedding, that is, when the filler must be linked to a deeply embedded position in the syntactic tree, they can still be easier to process if the surface linear distance is short, as measured by the number of intervening discourse referents or syntactic heads between the filler and the gap.

Importantly, while structural and locality-based accounts offer competing explanations for how filler–gap dependencies are resolved, both are compatible with the Active Filler Strategy, which posits that comprehenders attempt to resolve dependencies as early as possible, at least in English. This convergence arises because, in English, the preference for resolving the dependency at the MC site aligns with predictions from both theoretical perspectives. Specifically, the matrix gap site, such as the verb tell in example (3), is both structurally higher in the syntactic structure and linearly closer to the filler than the embedded verb catch. As a result, evidence from English alone does not allow us to disentangle whether early gap resolution is primarily guided by syntactic structure or by linear proximity.

Japanese, in contrast, provides an ideal test case for teasing apart these two accounts. Owing to its complement clause structure, Japanese reverses the linear order of embedded and main clauses. For example, the English sentence (3) corresponds to the Japanese structure in (4). In this construction, the EC choucho-o tsukamaeta (caught butterflies) precedes the MC verb iimashita (told). This word order creates a configuration in which the predictions of the structural and locality-based accounts come into direct conflict. In sentences like (4), the structural account predicts that comprehenders will associate the filler (Doko, where) with the syntactically higher MC verb (iimashita, told), while the locality-based account predicts that the dependency will be resolved at the linearly closer EC verb (tsukamaeta, caught).

(4): Doko-de Lizzie-wa [pro choucho-o tsukamaeta to] iimashita-ka?
Where-at Lizzie-TOP[pro butterfly-ACC catch-PST COMP] tell-PST-Q
“Where did Lizzie tell (someone) she was going to catch butterflies?”

In fact, this precise contrast was empirically tested by Aoshima et al. (2004). They conducted a series of self-paced reading experiments using Japanese wh-question sentences as in (5), where the EC precedes the MC. Their results revealed increased reading times at the EC verb and complementizer sequence (yonda-to, “read”), which they interpreted as a filler–gap effect, indicating that comprehenders had attempted to associate the fronted wh-phrase (Dono seito-ni, to which student) with a gap in the EC, resolving the filler–gap dependency at the linearly closer site. These findings support a locality-based account, according to which dependency resolution is guided by linear proximity rather than structural locality.

(5): Dono-seito-ni tannin-wa Koocyoo-ga hon-o yonda-to tosyositu-de sisyo-ni iimasita-ka?
Which student-DAT class teacher-TOP principal-NOM book-ACC read-DeclC library-AT librarian-DAT told-Q
‘Which student did the class teacher tell the librarian at the library that the principal read a book for?

Similarly, Omaki et al. (2014) conducted a study investigating the interpretation of Japanese wh-questions as in sentence (4), which provides a particularly transparent contrast with their English counterparts due to the reversal of EC and MC order. While Aoshima et al. (2004) focused on real-time processing in Japanese, Omaki et al. extended this line of work by directly comparing offline interpretations across Japanese and English using sentence structures that were syntactically simpler and lacked additional embeddings, in order to isolate cross-linguistic differences in filler–gap resolution preferences. Consistent with Aoshima et al.’s findings, Omaki et al. found that Japanese participants interpreted the wh-phrases as being associated with the EC, rather than the MC. This pattern suggests that when comprehenders encounter a wh-phrase, they attempt to resolve it at the first grammatically licit gap site they encounter in the linear order of the sentence, even if that site is structurally deeper in the syntactic tree. In other words, rather than prioritizing structurally higher syntactic tree sites like the MC, Japanese speakers show a preference for resolving filler–gap dependencies in the linearly closer clause, which challenges theories that emphasize structural hierarchy. Taken together, the results from these studies support the view that active gap filling may be more strongly influenced by linear locality rather than by syntactic prominence.

However, this view is at odds with insights from Japanese linguistic theory (e.g., Kuno, 1973; Hoji, 1985; Miyagawa, 2010), which emphasizes the central role of case-marking particles and clause-final elements in determining syntactic relationships. Unlike English, where word order is the primary cue to syntactic structure, Japanese relies heavily on morphosyntactic markers such as case particles (e.g., -ga for nominative, -o for accusative, -ni for dative) and complementizers (such as to, ka) to signal argument structure and clause type. As Saito (2021) points out, Japanese wh-phrases such as doko (where) are not morphologically marked for interrogative use and can appear in both questions and non-questions, as illustrated in (6b). From this perspective, the interrogative status of a wh-phrase is not determined until the comprehender encounters a licensing element, typically a clause-final question particle such as ka, as in (4), (5), and (6a).

(6): a. Doko-ni ikimasu-ka? (Where are you going?)
b. Doko-ni ikutomo iwanakatta. ((Someone) did not tell where he was going.)

This theoretical perspective suggests that these typological features of Japanese may influence how speakers process wh-questions. Unlike English wh-phrases such as what or where, which are unambiguously marked for interrogative force and typically appear in sentence-initial position, Japanese wh-phrases like doko (where) are not morphologically marked for interrogative use. Consequently, Japanese listeners must rely on downstream morphosyntactic cues that appear later in the sentence, particularly the clause-final question particles -ka, to determine the utterance is interrogative. This delayed availability of interrogative force may, in turn, lead Japanese comprehenders to withhold full commitment to a gap site until the sentence-final position.

If these morphosyntactic constraints are taken into account, previous findings suggesting an EC preference (e.g., Aoshima et al., 2004; Omaki et al., 2014) may warrant re-evaluation. While these studies have often been interpreted as supporting a linear proximity account, alternative interpretations of their results remain equally plausible. In particular, the increased reading times observed at the EC verb and complementizer in Aoshima et al.’s (2004) study do not necessarily reflect early structural commitment to a gap site in the EC. One possibility, acknowledged by Aoshima et al. (2004) themselves, is that participants initially expected the wh-phrase to be associated with the MC, and the appearance of the EC triggered a revision of this expectation.

Another possibility is that the increased reading times at the EC verb reflect not active gap formation but rather the absence of sufficient morphosyntactic cues to support a stable structural commitment. Under this interpretation, comprehenders may have temporarily withheld assigning a dependency, postponing structural commitment until more disambiguating information such as clause-final morphology became available later in the sentence. Thus, the increased reading time may reflect interpretative uncertainty or hesitation, rather than conclusive evidence of an early gap-filling in the EC. As for Omaki et al.’s (2014) results, their findings were based on end-of-sentence comprehension tasks, which reflect participants’ final interpretations rather than the incremental time courses by which those interpretations are derived. As such, their data do not provide direct evidence for early gap-filling or active dependency resolution during incremental processing.

To address these limitations and more directly investigate the time course of dependency resolution, the present study adopted a visual world eye-tracking paradigm (Altmann & Kamide, 2004). This method provides a temporally fine-grained and naturalistic approach to examining wh-dependency resolution in real time. Unlike self-paced reading, which provides only isolated reading times at descrete sentence regions, eye-tracking continuously captures moment-by-moment shifts in participants’ visual attention as they process spoken language. In addition to the key-press response data recorded at the end of each trial, which index participants’ final interpretations, the eye-tracking data allow us to examine not only which interpretation comprehenders ultimately adopt (EC vs. MC), but also when such interpretive preferences begin to emerge during sentence processing. A more detailed description of the experimental procedure is provided below.

In addition to using the fine-grained temporal resolution of eye-tracking, we also employed a cluster-based permutation analysis to examine the time course of wh-dependency resolution. This non-parametric approach allowed us to assess the reliability of observed effects across time without relying on pre-defined analysis windows. By combining this analytic method with temporally sensitive data, we tested whether Japanese wh-dependency resolution reflects early structural commitment or later revision, and whether it is guided more strongly by hierarchical syntactic structure than by linear proximity.

2. Materials and Methods

2.1. Participants

We recruited 36 native speakers of Japanese, all of whom were undergraduate students at a university in Japan. All participants had normal or corrected-to-normal vision and provided informed consent. None of the participants reported current immersion in any language environment other than Japanese.

2.2. Materials and Design

We used a visual-world eye-tracking paradigm to investigate how Japanese speakers process structurally ambiguous wh-questions. Each trial began with a context sentence, followed by a wh-question (see Appendix A for the full set of stimuli and exact participant instructions). The context sentences preceding each target wh-question introduced two distinct events: the event corresponding to the EC of the wh-question (e.g., Lizzie catching butterflies in the park) and the event corresponding to the MC (Lizzie telling her friend about it in the schoolyard). Across trials, we counterbalanced the order in which the two locations (e.g., the park and the schoolyard) were introduced, such that each location appeared first in half of the trials and second in the other half. This resulted in two conditions: the EC-MC event order condition (7a) and the MC-EC event order condition (7b). This manipulation ensured that any observed preferences in interpreting the ambiguous target wh-questions would reflect structural parsing biases rather than memory-based effects.

(7): a. EC-MC event order condition:
Lizzie-wa ami-o tsukatte koen-de chouchou-o tsukamaeta. Gogo, Lizzie-wa koutei-de sonokoto-o tomodachi-ni itta.
Lizzie-TOP net-ACC by using park-LOC butterfly-ACC caught afternoon Lizzie-TOP schoolyard-LOC that-thing-ACC friend-DAT said
“Lizzie caught butterflies in the park using her net. In the afternoon, she told her friend about it in the schoolyard.”
b. MC-EC event order condition:
Lizzie-wa koutei-de tomodachi-o mikaketa. Lizzie-wa kare-ni “ami-o tsukatte kouen-de chouchou-o tsukamaeta” to itta.
Lizzie-TOP schoolyard-LOC friend-ACC saw. Lizzie-TOP him-DAT “net-ACC by using park-LOC butterfly-ACC caught” COMP said
“Lizzie saw her friend in the schoolyard. She told him, “I caught butterflies in the park using my net”.

After the context sentence was aurally presented, participants heard a wh-question, as in (8). While listening to the question, their eye movements were recorded as they viewed a visual display containing three scenes: one depicting the EC interpretation, one depicting the MC interpretation, and one distracter image unrelated to the sentence (see Figure 1). This experimental setup allowed us to monitor the real-time resolution of filler–gap dependencies, based on participants’ visual attention to the referents associated with each possible interpretation. At the end of the sentence, participants were prompted to indicate their final interpretation of the wh-phrase’s referent by pressing a key to select the picture that best matched their understanding of the sentence. This task was designed to capture their offline interpretation, complementing the online eye-movement data.

In addition, in order to test how Japanese speakers process filler–gap dependencies with or without structural ambiguity, we created two wh-question conditions: an Ambiguous condition and an Unambiguous condition. In the Ambiguous condition (8a), the question was compatible with two plausible interpretations: participants could interpret the wh-phrase as referring either to the location of the telling event (i.e., the schoolyard; MC interpretation) or to the butterfly-catching event (i.e., the park; EC interpretation). In contrast, the Unambiguous wh-question condition (8b) contained another wh-phrase douyatte (how), which constrained the question to refer only to the MC event (i.e., the telling event).

The combination of context and wh-question manipulations resulted in a 2 × 2 factorial design, with two context conditions (EC-MC event vs. MC-EC event) and two question conditions (Ambiguous vs. Unambiguous), yielding four experimental conditions. Each participant completed 24 target items and 36 unrelated filler items.

(8): a. Ambiguous wh-question condition:
Doko-de Lizzie-wa [pro chouchou-o tsukamaeta-to] iimashita-ka?
Where-LOC Lizzie-TOP[pro butterfly-ACC catch-PST COMP] tell-PST-Q
“Where did Lizzie tell (someone) she was going to catch butterflies?”
Answer: Ambiguous (could refer to either the telling location or the catching butterflies location)
b. Unambiguous wh-question condition:
Doko-de Lizzie-wa douyatte [pro chouchou-o tsukamaeta to] iimashita-ka?
Where-LOC Lizzie-TOP how [pro butterfly-ACC catch-PST COMP] tell-PST-Q
“Where did Lizzie tell (someone) how she was going to catch butterflies?”
Answer: Unambiguous (refers only to the schoolyard, the telling location)

3. Results

3.1. Key-Press Responses

Participants’ key-press responses were analyzed separately for each condition (filler, unambiguous, and ambiguous) using generalized linear mixed-effects models (GLMMs) with random intercepts for participants and items (Jaeger, 2008). We analyzed key press data in three condition types separately because the dependent variable differed in interpretability across conditions (accuracy for filler and unambiguous items, response preference for ambiguous items). Within each condition, the model included trial order as a fixed effect.

For filler and unambiguous conditions, accuracy was coded as 1 (correct) or 0 (incorrect). One participant was excluded from analysis due to extremely low accuracy (25%) in the unambiguous condition. The remaining data were analyzed using GLMMs with a binomial link to estimate the log-odds of a correct response. Participants’ responses for filler items showed high accuracy (90.7%, SD = 29%), and the GLMM confirmed that accuracy was significantly above chance (β = 3.61, SE = 0.64, z = 5.61, p < 0.001), indicating that participants were attentive and consistently selected the correct referent. For unambiguous wh-questions (8b), the accuracy rate was 86.8% (SD = 33.9%) and the GLMM likewise showed that accuracy was significantly greater than chance (β = 2.46, SE = 0.66, z = 3.71, p < 0.001), confirming that participants were sensitive to syntactic cues and reliably interpreted the question as referring to the location of the MC event.

Crucially, for ambiguous wh-questions (8a), which allowed two equally plausible interpretations between MC and EC, we examined participants’ choices between the MC and EC pictures. As shown in Figure 2, participants showed a numerical preference for the MC interpretation, selecting the MC picture 54.9% of the time compared to 43.7% for the EC picture. To assess whether this difference reflected a statistically reliable bias, we analyzed the data using the same GLMM used for fillers and unambiguous questions. The dependent was binary, coded as 1 for MC response and 0 for EC response. The GLMM did not reveal a significant preference for the MC over the EC picture (β = 0.29, SE = 0.28, z = 1.05, p = 0.296). Given that the observed difference between MC and EC choices was small, it is likely that once participant- and item-level variability were taken into account, the overall bias toward MC interpretations did not reach significance. This outcome likely reflects the fact that when response proportions are close to chance (around 50%), a binary logit model has limited sensitivity to small graded differences in choice proportions (Jaeger, 2008).

To further examine whether a group-level bias was nevertheless present, we conducted a complementary analysis in which responses were aggregated by participant and item, and the proportion of MC choices was compared against chance (0.50) using a one-sample t-test. The analysis revealed that the MC picture was chosen significantly more than the EC picture (M = 0.55, SD = 0.27, t(418) = 2.01, p = 0.045), indicating a small but reliable interpretive bias toward the MC interpretation. The discrepancy between the non-significant GLMM result and the significant t-test likely reflects the conservative nature of mixed-effects models when accounting for random variability. While the GLMM provides a stringent test for participant- and item-level effects, the aggregated t-test offers a clearer view of the overall pattern at the group level. This behavioral pattern establishes an important baseline for interpreting real-time processing data from eye-tracking measures.

3.2. Eye-Tracking Data

3.2.1. Permutation Analysis

In order to examine whether participants’ eye movements revealed differences in referential interpretation while processing structurally ambiguous wh-questions, we conducted a cluster-based permutation analysis. This non-parametric approach identifies time windows in which fixations to the MC and EC pictures differ significantly, without relying on arbitrary, pre-defined time intervals. This approach is particularly well-suited to identifying significant effects over time in a data-driven manner.

Following Chan et al. (2018), fixation data were segmented into 20 ms time bins across the full analysis time window, time-locked to the onset of the region of interest for each of the 24 target items. Because our dataset was recorded at a temporal resolution of 20 ms, this bin size ensured that the analysis matched the sampling rate of the eye-tracking data. For each 20 ms time bin, we fit a regression model with the proportion of fixations as the dependent variable and picture type (MC vs. EC) as the predictor, and extracted the t-value for the picture type coefficient1. Adjacent bins with t-values corresponding to p < 0.05 were grouped into clusters. To assess the significance of these clusters, we ran 1000 permutations in which MC and EC labels were randomly reassigned within each trial, creating a null distribution of summed cluster-level t-values. If only 50 or fewer out of 1000 permuted clusters had an absolute summed t-value equal to or greater than that of the observed cluster, the cluster was considered statistically significant at p < 0.05.

To provide an overall view of gaze behavior across the structurally ambiguous question sentence, we first report the results of a permutation analysis time-locked to the onset of the entire sentence (Figure 3). This initial analysis offers a general picture of how listeners’ visual attention unfolded over time. However, because the analysis was conducted from sentence onset, it is not time-aligned with the onsets of later-emerging grammatical cues such as the accusative case marker -o, the embedded clause marker -to, or the sentence-final question particle -ka. Since the timing of these cues varied across the 24 target sentences, aligning all trials to the beginning of the sentence may result in temporal misalignment, thereby reducing sensitivity to effects associated with these grammatical cues. As a result, structure-sensitive responses, such as whether listeners revised their structural interpretation in response to them, may not be detected. To address this limitation, we also report a second analysis focused on a linguistically meaningful sub-region: from the onset of the EC event region preceding the EC marker -to (chouchou-o tsukamaeta; “butterflies-ACC caught”; Figure 4) through to the sentence offset (Figure 4).

3.2.2. Whole-Sentence Analysis

Figure 3 shows the time course of the proportion of looks to MC and EC pictures, time-locked to the onset of the entire ambiguous wh-questions. The blue line indicates the MC picture, and the red line indicates the EC picture. Time windows in which the proportion of looks to the two pictures differed significantly are shaded in grey, based on the results of a permutation analysis. The green and orange bars near 0.2 represent p-values for individual 20 ms time bins, with green indicating bins that reached significance (p < 0.05) and orange indicating bins that did not, prior to cluster formation.

The analysis revealed two significant clusters in which the proportion of looks to the MC and EC pictures differed significantly. The first cluster, occurring in the 880–980 ms time window following question onset, showed a higher proportion of looks to the MC picture. In contrast, the second cluster, occurring in the 1680–2400 ms time window following question onset, showed more looks to the EC picture. A summary of these significant clusters is provided in Table 1.

The early cluster (880–980 ms) likely reflects participants’ initial prediction about how the question would continue upon hearing the sentence-initial fragment (Doko-de Lizzie-wa “Where did Lizzie”), before any disambiguating information signaling the presence of the EC was introduced. The increased fixation to the MC picture during this window suggests that listeners anticipated the question would concern the MC event (e.g., iimashita, “told”). Since the verb itta (told) is highly unlikely to appear within an EC following this fragment, fixation to the location of the MC event indicates that participants adopted an MC interpretation of the wh-question (“Where did Lizzie tell?”).

This early prediction may have been shaped by several factors. Although the order of the two events in the MC and EC introduced in the preceding context (e.g., “Lizzie catching butterflies” and “Lizzie seeing her friend to tell him about it”) was counterbalanced across trials, the context sentence consistently ended with itta (“told”, as in (7a, b)). This recency effect of itta in the preceding context may have made the telling event more salient and cognitively accessible, thereby biasing participants to expect that the upcoming question would continue with that event. In addition to the contextual influence, listeners may have relied on a general structural bias: MC constructions tend to be more frequent in natural language use and are often treated as the default interpretation during sentence processing (Ferreira, 2003; Trueswell et al., 1994). This bias, combined with the consistent appearance of the verb itta (told) at the end of the context sentence, may have contributed to the increased fixations on the MC picture during the early time window. These early fixations suggest that listeners initially expected the question to continue with an MC structure, such as where did Lizzie tell?

Following this early effect, the increased fixations to the EC picture observed in the second significant cluster (1680–2400 ms) likely reflect a shift in processing as new linguistic input became available. During this time window, critical elements of the EC appeared, such as chouchou-o (butterflies-ACC), suggesting that the question might concern the butterfly-catching event. While the MC interpretation remains syntactically viable at this point, the accusative marker -o in chouchou-o (butterflies-ACC) may also serve as an early cue that the sentence is departing from the MC frame, prompting participants to begin considering an EC interpretation even before the complementizer -to is heard.

To interpret this pattern of increased fixations on the EC picture, we consider two possibilities. One is that the observed shift reflects syntactic reanalysis, whereby listeners revise their initial prediction and restructure the sentence from an MC to an EC interpretation. Alternatively, the increased fixations may reflect the continuation of structure-based interpretation under a temporarily preferred MC analysis. In this case, listeners maintained the MC structure but revised their interpretation of the event being queried. That is, they continued to treat the utterance as a mono-clausal wh-question under an MC analysis, but shifted their gaze to the EC picture as their interpretation shifted from Where did Lizzie say? to Where did Lizzie catch butterflies? The question of whether this pattern reflects syntactic reanalysis or a revision within an ongoing MC interpretation is addressed in more detail below, in the EC-to-sentence-final region analysis.

Importantly, regardless of which interpretation listeners adopted up to this point, the appearance of the EC marker -to in tsukamaeta-to (caught) provided the first unambiguous syntactic cue that the sentence involved an EC. At this point, a structural reanalysis from an MC to EC interpretation became grammatically necessary. To assess whether listeners subsequently attached the fronted where phrase to the EC following this structural disambiguation, we examine gaze behavior in the sentence-final region, where the EC structure had become explicit.

3.2.3. EC-to-Sentence-Final Region Analysis

In order to examine how listeners might have resolved the attachment of the fronted where-phrase after the sentence structure was unambiguously revealed to involve an EC, we conducted a separate permutation analysis focused on the latter portion of the sentence. While the EC marker -to serves as the disambiguating cue for the presence of an EC, we include the period preceding it, beginning with the onset of the EC noun and verb regions chouchou-o tsukamaeta (butterfly-ACC caught), to capture the complete time course for how listeners responded to the unfolding EC-related information. As previously noted, the whole-sentence analysis reported in Figure 3 was time-locked to the onset of the entire question, which may not have accurately captured effects tied to later-occurring cues such as the EC marker or MC onset whose timing varied across items. This temporal misalignment may have diluted or obscured effects, particularly in the latter positions of the sentence. By realigning the analysis window to the onset of EC noun phrase chouchou-o (butterfly-ACC) and continuing through to the sentence offset, we are better able to track how gaze patterns evolved in response to structural disambiguation and how listeners ultimately resolved the attachment of the fronted wh-phrase.

Figure 4 shows the proportion of looks to the MC and EC pictures over the time window beginning at the onset of the EC noun phrase (chouchou-o, “butterfly-ACC”) and continuing through to the sentence offset.

The analysis revealed two significant clusters in which the proportion of looks to the two pictures differed significantly. A summary of these clusters is provided in Table 2. The first cluster, occurring during the EC region, showed a higher proportion of looks at the EC picture. Although this effect was already observed in the initial analysis reported in Figure 3, the current time-locked approach reveals more clearly that participants began shifting their gaze toward the EC picture well before the EC marker -to appeared.

In contrast, the second cluster showed a higher proportion of looks to the MC picture than to the EC picture. Notably, this effect was not detected in the initial whole-sentence analysis, likely because responses to later-occurring structural cues were temporally misaligned across items. By time-locking the analysis to the onset of the EC noun phrase, the current approach offers a clearer view of how gaze patterns evolved during and after the point of disambiguation for the presence of an EC. The pattern observed in this later time window mirrors participants’ key-press responses, in which they ultimately favored the MC interpretation. This suggests that the late shift in gaze toward the MC picture reflects the resolution of the wh-attachment, consistent with their final behavioral choice. As participants encountered the MC verb followed by the question particle (iimashitaka, “told-Q”), they appeared to have attached the fronted wh-phrase to the matrix predicate, adopting the MC interpretation.

3.2.4. Unambiguous Condition Analysis

The analyses reported above showed that in the ambiguous condition, participants shifted their gaze between the MC and EC pictures as the sentence unfolded, suggesting that they were incrementally evaluating possible interpretations of the wh-phrase. After the point of disambiguation for the presence of an EC, a late shift in gaze toward the MC picture mirrored participants’ key-press responses, indicating that the wh-attachment was ultimately resolved at the MC predicate. A subsequent return of gaze to the MC picture most likely reflects that participants finalized this interpretation when the clause-final question particle -ka became available, confirming the interrogative use of the wh-phrase and signaling participants’ final commitment to the MC interpretation.

However, there remains the possibility that these gaze patterns simply reflected participants aligning their attention with the events being described in the sentence, rather than actively evaluating alternative structural interpretations. If this were the case, then in the unambiguous condition (8b), where the wh-phrase can only be interpreted in the MC structure, participants should nevertheless have shifted their gaze toward the EC picture when the EC event was described. If, in contrast, gaze patterns reflect syntactic interpretation, then participants should consistently maintain their preference for the MC picture across the sentence in the unambiguous condition.

To evaluate this possibility, we conducted a cluster-based permutation analysis for the unambiguous questions. As shown in Figure 5 and Figure 6, the results revealed an overall pattern of greater fixations on the MC picture. The whole-sentence analysis (Figure 5) revealed three significant clusters in which fixations to the MC picture exceeded those to the EC picture. The first cluster occurred from 1060 to 1260 ms (observed sum t = 46.21, p < 0.001), the second cluster from 1980 to 2080 ms (observed sum t = 25.58, p < 0.001), the third cluster from 2100 to 2300 ms (observed sum t = 47.99, p < 0.001), and the fourth cluster from 3080 to 5320 ms (observed sum t = 148.05, p < 0.001). The EC-to-sentence-final region analysis (Figure 6) revealed one significant cluster, from 1000 to 2020 ms (observed sum t = 491.82, p < 0.001), again showing greater fixations to the MC picture at the end of the sentence.

These patterns indicate that participants maintained a stable preference for the MC interpretation: they exhibited an early predictive bias toward the MC picture, sustained this preference even when EC-related lexical material was presented, and reinforced it once the sentence-final particle became available. Importantly, unlike in the ambiguous condition, participants did not show temporary increases in looks to the EC picture when the EC event was mentioned in the unambiguous sentences. These findings confirm that in the absence of structural ambiguity, gaze patterns are anchored to the syntactic interpretation of the wh-phrase rather than simply reflecting the mention of events.

4. Discussion

The present study examined how Japanese speakers interpret structurally ambiguous wh-questions, aiming to distinguish between two competing accounts of filler–gap resolution: one based on structural locality and the other on linear locality. Specifically, we investigated how listeners resolve ambiguity between MC and EC interpretations in real time. Key-press responses to ambiguous wh-questions revealed a preference for the MC interpretation, indicating that Japanese speakers do not resolve filler–gap dependencies based solely on linear proximity. Eye-tracking data further illuminated the time course of processing, showing how listeners incrementally constructed their interpretation as the sentence unfolded. To capture this process, we conducted two permutation analyses. The first, a whole-sentence analysis time-locked to question onset, provided a broad view of gaze patterns in processing the entire sentence. The second analysis, time locked to the onset of the EC region and continuing through to the sentence offset, focused on a narrower time window and allowed us to isolate participants’ responses to syntactic cues that emerged later in the sentence, such as the EC marker -to and the clause-final question particle -ka.

Findings from the whole-sentence analysis revealed two temporally distinct clusters in gaze behavior. The early cluster, occurring before participants were introduced to any critical lexical elements beyond the wh-phrase doko (where) and the subject Lizzie-NOM, showed a significant increase in fixations to the MC picture. This suggests that participants initially predicted that the question would concern the MC event (e.g., the act of telling), likely due to a combination of structural biases and the greater accessibility of the verb itta (told) relative to the embedded event. Although the two event locations were counterbalanced, itta (told) consistently appeared in the final position in the context sentence, making it especially salient and readily activated at the onset of the following wh-question. Moreover, itta (told) can occur only in the MC structure in fronted wh-questions, further constraining interpretation. These factors, together with a general bias toward MC interpretations over EC, likely contributed to this early predictive preference.

The second cluster, also observed in the whole-sentence analysis, showed increased fixations to the EC picture as the EC noun phrase (chouchou-o, butterflies-ACC) became available. As discussed above, one possible interpretation of this shift is that it reflects syntactic reanalysis, with listeners revising their initial MC prediction and restructuring the sentence from an MC to an EC interpretation. Alternatively, the increased fixations to the EC picture may reflect a revision of semantic interpretation under a sustained MC analysis. In this view, listeners retained the initially adopted MC structure but updated their interpretation of the event being queried, from Where did Lizzie say? to Where did Lizzie catch butterflies?, while still treating the question as an MC construction.

To test between the two possible interpretations of the increased fixations to the EC picture, namely whether they reflect syntactic reanalysis or semantic revision under sustained MC analysis, we conducted the EC-to-sentence-final region analysis, time-locked to the onset of the EC region. This analysis confirmed that the gaze shift to the EC picture occurred well before the appearance of the EC marking complementizer -to, which explicitly marks the presence of an EC. This timing most likely suggests that listeners were initially responding to the unfolding semantic content, such as the mention of butterflies, and that the increased fixations to the EC picture are unlikely to reflect syntactic reanalysis. The observed gaze pattern implies that participants may have tentatively entertained an EC interpretation as a plausible continuation of the sentence, aligning their attention with the referential content as it emerged. However, the subsequent increase in fixations to the MC picture during the final portion of the sentence, also captured in the EC-to-sentence-final region analysis, suggests that listeners ultimately returned to or confirmed an MC interpretation. This late shift in gaze aligns with the behavioral key-press responses, which also favored MC interpretations.

Taken together, these findings indicate that while listeners are sensitive to local semantic cues during incremental processing, the structural resolution of the filler–gap dependency ultimately depends on syntactic information that emerges later in the sentence, such as the matrix verb and sentence-final question particle.

These findings have several theoretical implications. First, they challenge the view that filler–gap resolution in Japanese is guided predominantly by linear locality, as proposed in earlier studies (e.g., Aoshima et al., 2004; Omaki et al., 2014). Participants temporarily shifted their gaze toward the EC picture as the EC noun phrase (chouchou-o, butterflies-ACC) unfolded, which may suggest that they were beginning to consider an EC interpretation. Yet, because this shift occurred prior to the appearance of the EC marker -to, it is more likely to reflect semantic revision under a sustained MC interpretation rather than syntactic reanalysis from an MC to an EC structure. The subsequent return to the MC picture, aligned with the sentence-final structural cues and mirrored in the key-press responses, indicates that participants ultimately resolved the dependency at the structurally higher MC site. These results suggest that while local linear or semantic cues may influence participants’ moment-to-moment attention, they do not override the role of structural locality in guiding filler–gap resolution.

Second, our results demonstrate that Japanese comprehenders engage in predictive processing, using contextual and morphosyntactic cues to anticipate upcoming structure. However, unlike comprehenders of languages such as English, where wh-phrases are morphologically marked, Japanese comprehenders appear to delay structural commitment until interrogative force is confirmed by sentence-final cues (e.g., clause-final particles). This pattern is consistent with accounts that emphasize the role of case-marking and clause-final particles in Japanese parsing (e.g., Kuno, 1973; Miyagawa, 2010).

The observed preference for resolving dependencies at the structurally higher MC syntactic tree site provides support for accounts that posit a syntactic hierarchy bias in filler–gap resolution. These accounts propose that comprehenders preferentially attach fillers to the highest grammatically available position in the tree structure (e.g., Frazier & Clifton, 1989; Phillips, 1996, 2006). Our findings in Japanese support the view that structure-based parsing preference, such as those proposed under the Minimal Attachment hypothesis (Frazier & Fodor, 1978), operates as a general parsing principle across languages. In both English and Japanese, comprehenders prefer to resolve filler–gap dependencies at the structurally higher MC site. However, unlike English speakers, who are often shown to commit to the MC interpretation early, Japanese comprehenders delay this commitment due to typological features such as the presence of clause-final disambiguating cues. Importantly, however, the ultimate preference for MC attachment in Japanese, despite the linear proximity of the EC, provides evidence for structure-based parsing over linear locality. This interpretation is in line with other studies of Japanese sentence processing, such as Tamaoka and Mansbridge (2019), who examined scrambled sentences and showed that readers often revisit earlier material to establish structural coherence. Their results, like ours, suggest that Japanese parsing cannot be reduced to simple linear locality.

Finally, we consider possible reasons why our results diverge from those of Aoshima et al. (2004) and Omaki et al. (2014), who reported linear locality-based preferences for the EC interpretation in Japanese filler–gap dependency construction. One possibility is that these earlier findings reflect methodological limitations. Because the self-paced reading and end-point judgment methods employed in the earlier studies did not capture the fine-grained time course of filler–gap dependency formation, they may have conflated temporary attention to the EC, possibly driven by semantic salience or recent mention, with actual dependency resolution, as suggested by our eye-tracking data.

In particular, it is worth noting that our offline judgment task and Omaki et al.’s task produced opposite outcomes: whereas our participants showed a preference for the MC interpretation, Omaki et al. found an EC preference. We suggest that this discrepancy is likely due to differences in task design. Omaki et al.’s experiment involved long story contexts introducing four different locations sequentially before the target wh-question. Crucially, the EC location was introduced toward the end of the story, while the MC location appeared at the beginning of the context. This asymmetry makes the EC location more recent and therefore more accessible at the time participants answered the question. Given that four different locations were introduced in each story, memory load was also higher, further increasing the likelihood that participants relied on the most recently mentioned location. Moreover, their study tested only 16 participants with eight items, limiting statistical power and making the results especially susceptible to such task-specific influences.

By contrast, our design minimized narrative load, balanced the visual referents, and directly tracked online interpretations, revealing that participants ultimately resolved the dependency at the MC site. The results suggest that Japanese comprehenders do not resolve filler–gap dependencies at the earliest grammatically available position. Rather, they appear to delay commitment until reliable structural cues, such as sentence-final particles, confirm that the sentence-initial wh-phrase is being used to form a question. This pattern suggests a predictive, yet structurally cautious parsing strategy, in which comprehenders build expectations incrementally but withhold commitment until syntactic confirmation becomes available, something that earlier methods may have failed to detect.

5. Conclusions

The present study provides evidence that Japanese comprehenders rely primarily on structural locality, rather than linear locality, in resolving filler–gap dependencies in wh-questions. Through a combination of behavioral and eye-tracking data, we showed that Japanese participants exhibit early predictive preferences for the MC interpretation, and although local semantic cues such as the EC noun phrase drew attention, final interpretations consistently aligned with the structurally prominent MC site. The pattern of early structural prediction, temporary sensitivity to semantic content, and ultimate commitment to the MC interpretation highlights the dynamic, yet structurally constrained, nature of real-time comprehension. The findings enhance our understanding of cross-linguistic variation in parsing strategies, suggesting that Japanese comprehenders engage in predictive processing but delay structural commitment until disambiguating cues such as clause-final particles become available. This pattern reflects a structurally guided parsing strategy that prioritizes structural locality over linear proximity, consistent with general parsing principles observed across languages. These results contribute to ongoing debates about the universality and variability of predictive parsing mechanisms and suggest that predictive systems can be influenced by typologically specific grammatical cues. Future work may investigate whether structurally driven predictive strategies extend to sentence types beyond filler–gap dependencies in Japanese, and how structurally based predictive biases develop over time, potentially shifting from linear gap-filling in early stages to structurally guided interpretation as language processing mechanisms develop.

Author Contributions

Conceptualization, C.N., S.F., Y.M. and N.Y.; methodology, C.N.; software, C.N.; validation, C.N., S.F., Y.M. and N.Y.; formal analysis, C.N.; investigation, C.N., S.F., Y.M. and N.Y.; resources, C.N., S.F., Y.M. and N.Y.; data curation, C.N., S.F., Y.M. and N.Y.; writing—original draft preparation, C.N.; writing—review and editing, C.N., S.F., Y.M. and N.Y.; visualization, C.N.; supervision, S.F., Y.M. and N.Y.; project administration, C.N., S.F., Y.M. and N.Y.; funding acquisition, C.N., Y.M. and N.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by JSPS KAKENHI Grant numbers 21KK0006 (Fostering Joint International Research) and 22K18476 (Grant-in-Aid for Challenging Research (Exploratory)) and Core-to-Core Program Grant number JPJSCCA20210001.

Institutional Review Board Statement

The study was approved by the Ethics Review Committee on Research with Human Subjects of omitted for peer review University, under protocol number 2024-130.

Informed Consent Statement

Informed consent was obtained from all participants involved in the study.

Data Availability Statement

The data used in this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

The experiment included 24 target wh-question items. Each sentence was presented in Japanese and appeared in two conditions. In the Ambiguous condition, the wh-phrase doko (“where”) could be interpreted as modifying either the matrix clause or the embedded clause. In the Unambiguous condition, the adverb douyatte (“how”) was inserted, which constrained the interpretation to the main clause and eliminated the attachment ambiguity.

Participants were given the following instructions on screen before starting the experiment: “In this experiment, you will hear a short sentence twice, and a question following it. Your task is to choose one of the pictures for your answer by pressing 1, 2, or 3.”

1	Doko de Lily wa (douyatte) tokei o katta to iimashita ka?
	‘Where did Lily tell someone (how) she bought a watch?’
2	Doko de Hannah wa (douyatte) takaramono o mitsuketa to iimashita ka?
	‘Where did Hannah tell someone (how) she found the treasure?’
3	Doko de Charlotte wa (douyatte) hon o yonda to iimashita ka?
	‘Where did Charlotte tell someone (how) she read the book?’
4	Doko de Ryan wa (douyatte) T-shatsu o tsukutta to iimashita ka?
	‘Where did Ryan tell someone (how) he made a T-shirt?’
5	Doko de Nick wa (douyatte) paatii o hiraita to iimashita ka?
	‘Where did Nick tell someone (how) he held a party?’
6	Doko de Annie wa (douyatte) uma ni notta to iimashita ka?
	‘Where did Annie tell someone (how) she rode a horse?’
7	Doko de Oliver wa (douyatte) shukudai o shita to iimashita ka?
	‘Where did Oliver tell someone (how) he did homework?’
8	Doko de Lizzie wa (douyatte) chouchou o tsukamaeta to iimashita ka?
	‘Where did Lizzie tell someone (how) she caught a butterfly?’
9	Doko de Olivia wa (douyatte) bengoshi to menkai shita to iimashita ka?
	‘Where did Olivia tell someone (how) she met with a lawyer?’
10	Doko de Josh wa (douyatte) okashi o katta to iimashita ka?
	‘Where did Josh s tell someone (how) he bought sweets?’
11	Doko de Rachel wa (douyatte) pengin no shashin o totta to iimashita ka?
	‘Where did Rachel tell someone (how) she took a picture of a penguin?’
12	Doko de Thomas wa (douyatte) koohii o katta to iimashita ka?
	‘Where did Thomas tell someone (how) he bought coffee?’
13	Doko de Lucy wa (douyatte) kogitte o hakkou shita to iimashita ka?
	‘Where did Lucy tell someone (how) she issued a check?’
14	Doko de Jack wa (douyatte) tsuri o shita to iimashita ka?
	‘Where did Jack tell someone (how) he went fishing?’
15	Doko de Jeff wa (douyatte) uchuujin o mita to iimashita ka?
	‘Where did Jeff tell someone (how) he saw an alien?’
16	Doko de Emma wa (douyatte) akachan to asonda to iimashita ka?
	‘Where did Emma tell someone (how) she played with a baby?’
17	Doko de Susan wa (douyatte) shokuji o shita to iimashita ka?
	‘Where did Susan tell someone (how) she had a meal?’
18	Doko de Bill wa (douyatte) sakkaa o shita to iimashita ka?
	‘Where did Bill tell someone (how) he played soccer?’
19	Doko de Ben wa (douyatte) iruka o mita to iimashita ka?
	‘Where did Ben tell someone (how) he saw a dolphin?’
20	Doko de Haidi wa (douyatte) watagashi o katta to iimashita ka?
	‘Where did Heidi tell someone (how) she bought cotton candy?’
21	Doko de Ian wa (douyatte) baabekyuu o shita to iimashita ka?
	‘Where did Ian tell someone (how) he had a barbecue?’
22	Doko de James wa (douyatte) atarashii eiga o satsuei shita to iimashita ka?
	‘Where did James tell someone (how) he shot a new movie?’
23	Doko de John wa (douyatte) aisukuriimu o katta to iimashita ka?
	‘Where did John tell someone (how) he bought ice cream?’
24	Doko de Emily wa (douyatte) ki ni nobotta to iimashita ka?
	‘Where did Emily tell someone (how) she climbed a tree?’

Note

1

On average, the distractor picture attracted about 11% of looks, with the highest proportion occurring early in the sentence (~23%) and decreasing to below 10% thereafter. Because this rate is well below chance level (33% with three pictures), distractor looks were not considered further and are not reported in the results.

References

Altmann, G. T. M., & Kamide, Y. (2004). Now you see it, now you don’t: Mediating the mapping between language and the visual world. In The oxford handbook of psycholinguistics (pp. 555–566). Psychology Press. [Google Scholar]
Aoshima, S., Phillips, C., & Weinberg, A. (2004). Processing filler–gap dependencies in a head-final language. Journal of Memory and Language, 51(1), 23–54. [Google Scholar] [CrossRef]
Chan, A., Yang, W. C., Chang, F., & Kidd, E. (2018). Four-year-old Cantonese-speaking children’s online processing of relative clauses: A permutation analysis. Journal of Child Language, 45(1), 174–203. [Google Scholar] [CrossRef] [PubMed]
Crain, S., & Fodor, J. D. (1985). How can grammars help parsers? Natural language parsing: Psychological, computational, and theoretical perspectives (pp. 94–128). Cambridge University Press. [Google Scholar]
Ferreira, F. (2003). The misinterpretation of noncanonical sentences. Cognitive Psychology, 47(2), 164–203. [Google Scholar] [CrossRef] [PubMed]
Frazier, L. (1987). Sentence processing: A tutorial review. In M. Coltheart (Ed.), Attention and performance XII: The psychology of reading (pp. 559–586). Lawrence Erlbaum Associates. [Google Scholar]
Frazier, L., & Clifton, C. (1989). Successive cyclicity in the grammar and the parser. Language and Cognitive Processes, 4(2), 93–126. [Google Scholar] [CrossRef]
Frazier, L., & Fodor, J. D. (1978). The sausage machine: A new two-stage parsing model. Cognition, 6(4), 291–325. [Google Scholar] [CrossRef]
Gibson, E. (1998). Linguistic complexity: Locality of syntactic dependencies. Cognition, 68(1), 1–76. [Google Scholar] [CrossRef] [PubMed]
Gibson, E. (2000). The dependency locality theory: A distance-based theory of linguistic complexity. In Y. Miyashita, A. Marantz, & W. O’Neil (Eds.), Image, language, brain: Papers from the first mind articulation project symposium (pp. 95–126). MIT Press. [Google Scholar]
Hoji, H. (1985). Logical form constraints and configurational structures in Japanese [Doctoral dissertation, University of Washington]. [Google Scholar]
Jaeger, T. F. (2008). Categorical data analysis: Away from ANOVAs (transformation or not) and towards logit mixed models. Journal of Memory and Language, 59(4), 434–446. [Google Scholar] [CrossRef] [PubMed]
Kuno, S. (1973). The structure of the Japanese language. MIT Press. [Google Scholar]
Lewis, R. L., & Vasishth, S. (2005). An activation-based model of sentence processing as skilled memory retrieval. Cognitive Science, 29(3), 375–419. [Google Scholar] [CrossRef] [PubMed]
Miyagawa, S. (2010). Why agree? why move? unifying agreement-based and discourse-configurational languages. MIT Press. [Google Scholar]
Omaki, A., Davidson White, I., Goro, T., Lidz, J., & Phillips, C. (2014). No fear of commitment: Children’s incremental interpretation in English and Japanese wh-questions. Language Learning and Development, 10(3), 206–233. [Google Scholar] [CrossRef]
Phillips, C. (1996). Order and structure [Doctoral dissertation, MIT]. [Google Scholar]
Phillips, C. (2006). The real-time status of island phenomena. Language, 82(4), 795–823. [Google Scholar] [CrossRef]
Saito, M. (2021, June 25). Wh-phrases without quantificational particles. Seminar of JSPS Core-to-Core Program (A) International Research Network for the Human Language Faculty, Online. [Google Scholar]
Stowe, L. A. (1986). Parsing wh-constructions: Evidence for on-line gap location. Language and Cognitive Processes, 1(3), 227–245. [Google Scholar] [CrossRef]
Tamaoka, K., & Mansbridge, M. (2019). Regressive eye movements in reading Japanese scrambled sentences: Evidence against linear locality. Gengo Kenkyu, 155, 35–63. [Google Scholar] [CrossRef]
Traxler, M. J., & Pickering, M. J. (1996). Plausibility and the processing of unbounded dependencies: An eye-tracking study. Journal of Memory and Language, 35(3), 454–475. [Google Scholar] [CrossRef]
Trueswell, J. C., Tanenhaus, M. K., & Garnsey, S. M. (1994). Semantic influences on parsing: Use of thematic role information in syntactic ambiguity resolution. Journal of Memory and Language, 33(3), 285–318. [Google Scholar] [CrossRef]

Figure 1. Three pictures presented with (8).

Figure 2. Key press responses to ambiguous wh-questions.

Figure 3. Proportion of looks to the MC and EC pictures over the time course of the ambiguous target questions. The first and second vertical lines indicate the mean onset of the EC noun phrase (e.g., chouchou-o, butterfly-ACC) and the mean onset of the MC verb (e.g., iimashita, told), respectively. Orange and green bars indicate p-values for the individual time bins (orange = p > 0.05, green = p < 0.05). The grey-shaded areas mark the time windows in which the proportion of looks to the two pictures differed significantly, as identified by the permutation analysis.

Figure 4. Proportion of looks to the MC and EC pictures from the onset of the EC noun phrase to the sentence offset of the ambiguous target questions. Orange and green bars indicate p-values for the individual time bins (orange = p > 0.05, green = p < 0.05). The grey-shaded areas mark the time windows in which the proportion of looks to the two pictures differed significantly, as identified by the permutation analysis.

Figure 5. Proportion of looks to the MC and EC pictures over the time course of the unambiguous target questions. Orange and green bars indicate p-values for the individual time bins (orange = p > 0.05, green = p < 0.05). The grey-shaded areas mark the time windows in which the proportion of looks to the two pictures differed significantly, as identified by the permutation analysis.

Figure 6. Proportion of looks to the MC and EC pictures from the onset of the EC noun phrase to the sentence offset of the unambiguous target questions. Orange and green bars indicate p-values for the individual time bins (orange = p > 0.05, green = p < 0.05). The grey-shaded area marks the time window in which the proportion of looks to the two pictures differed significantly, as identified by the permutation analysis.

Table 1. Significant clusters per group identified by cluster-based permutation tests. Time windows indicate the duration over which the significant cluster was observed.

Cluster Number	Time Window (ms)	Observed Sum t	p-Value
41	880–980	14.26	0.019
74	1680–2400	177.67	<0.001

Table 2. Significant clusters identified by the second cluster-based permutation tests. Time windows indicate the duration over which the significant cluster was observed.

Cluster Number	Time Window (ms)	Observed Sum t	p-Value
25	460–1260	−193.09	<0.001
45	1720–1820	14.47	0.028

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nakamura, C.; Flynn, S.; Miyamoto, Y.; Yusa, N. Proximity Loses: Real-Time Resolution of Ambiguous Wh-Questions in Japanese. Languages 2025, 10, 288. https://doi.org/10.3390/languages10120288

AMA Style

Nakamura C, Flynn S, Miyamoto Y, Yusa N. Proximity Loses: Real-Time Resolution of Ambiguous Wh-Questions in Japanese. Languages. 2025; 10(12):288. https://doi.org/10.3390/languages10120288

Chicago/Turabian Style

Nakamura, Chie, Suzanne Flynn, Yoichi Miyamoto, and Noriaki Yusa. 2025. "Proximity Loses: Real-Time Resolution of Ambiguous Wh-Questions in Japanese" Languages 10, no. 12: 288. https://doi.org/10.3390/languages10120288

APA Style

Nakamura, C., Flynn, S., Miyamoto, Y., & Yusa, N. (2025). Proximity Loses: Real-Time Resolution of Ambiguous Wh-Questions in Japanese. Languages, 10(12), 288. https://doi.org/10.3390/languages10120288

Article Menu

Proximity Loses: Real-Time Resolution of Ambiguous Wh-Questions in Japanese

Abstract

1. Introduction

2. Materials and Methods

2.1. Participants

2.2. Materials and Design

3. Results

3.1. Key-Press Responses

3.2. Eye-Tracking Data

3.2.1. Permutation Analysis

3.2.2. Whole-Sentence Analysis

3.2.3. EC-to-Sentence-Final Region Analysis

3.2.4. Unambiguous Condition Analysis

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

Note

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI