Multimodal Resolution of Overlapping Talk in Video-Mediated L2 Instruction

: This paper investigates a pervasive phenomenon in video-mediated interaction (VMI), namely, simultaneous start-up s, which happen when two speakers produce a turn beginning in overlap. Based on the theoretical and methodological tenets of conversation analysis and interactional linguistics, the present study offers a multimodal and sequential account of how simultaneous start-ups are oriented to and solved in the context of English as an additional language (L2) tutoring. The micro- and sequential analysis of ten hours of screen-recorded video-mediated data from tutoring sessions between an experienced tutor and an advanced-level tutee reveals that the typical overlap resolution trajectory results in the tutor withdrawing from the interactional ﬂoor. The same analysis uncovered a range of resources, such as lip pressing and the verbal utterance ‘go ahead’, employed in what we call enhanced explicitness , through which the withdrawal is done. The orchestration of these resources allows the tutor to exploit the speciﬁc features of the medium to resolve simultaneous start-ups while also supporting the continuation of student talk. We maintain that this practice is used in the service of securing the learner’s interactional space, and consequently in fostering the use of the language being learned. The results of the study help advance current understandings of L2 instructors’ specialized work of managing participation and creating learning opportunities. Being one of the ﬁrst studies to detail the practices involved in overlap resolution in the micro-context of simultaneous talk on Zoom-based L2 instruction, this study also makes a signiﬁcant contribution to research on video-mediated instruction and video-mediated interaction more generally.


Introduction
One of the basic rules of the turn-taking system described by Sacks et al. (1974) is that one party talks at a time, with no or minimal gap or overlap between turns. When overlap occurs, participants draw on the overlap resolution device, which is an organized set of practices by which the parties attempt to resolve the overlap and return to a 'one-at-atime' situation (Schegloff 2000). Although several kinds of overlapping talk do not need to be avoided or resolved (see, e.g., the case of early responses, explored by Deppermann and Schmidt 2021), certain forms of overlapping talk are considered "recognizable events" (Schegloff 2000, p. 11), which participants orient to and act upon to secure the progressivity of ongoing talk. An example of this type of overlap are simultaneous start-ups (Schegloff 2000), which are the focus of this paper. Simultaneous start-ups are overlaps that occur at transition relevance places (TRP), i.e., places in the interaction "where a change of speakership becomes a salient possibility" (Clayman 2012, p. 151) and are recognized as initiating a new turn-at-talk.
In order to avoid beginning a new turn in overlap or to resolve simultaneous start-ups as quickly as possible, speakers monitor each other closely; that is, ending and incoming Languages 2022, 7, 154 2 of 22 turns are projected via participants' gestures, modified posture, facial expressions, and manipulation of objects (Mortensen 2009;Oloff 2013;Mondada 2014Mondada , 2007. In video-mediated interaction, however, participants' access to each other's vocal and embodied behavior is impacted due to a range of medium-specific features, such as corporal segmentation, lack of reliance on eye gaze direction, and synchronicity issues, e.g., delays (Luff et al. 2003;Arminen et al. 2016;Licoppe and Morel 2012;Ruhleder and Jordan 2001;Rusk and Pörn 2019;Seuren et al. 2021). As a result, overlapping talk that requires negotiation from the participants (e.g., simultaneous start-ups) has been shown to happen more frequently in video-mediated encounters in comparison to face-to-face interaction (Schneider 2017;Olbertz-Siitonen 2015;Seuren et al. 2021), and is impacted by, e.g., the higher transition time (487 ms) in comparison with face-to-face conversation (135 ms) (see Boland et al.'s (2021) experimental study with dyads interacting via Zoom).
In the data analyzed in this paper, which stem from English tutoring sessions held over Zoom, we identified 41 instances of simultaneous start-ups in ten sixty-minute sessions. Our analysis shows that most cases of this type of overlap result in the student remaining on the floor, and that the tutor draws on a range of multi-semiotic resources to enhance the explicitness of the turn-yielding moves that she uses to secure the floor for the student. We argue that this recurrent resolution trajectory reflects participants' shaped conduct in relation to the affordances of the medium, as well as the tutor's orientation to moments of simultaneous start-ups as emergent language use/learning opportunities (Sert 2017) and her role as an 'interactional manager' (Walsh 2006;Kasper 2004) responsible for crafting such opportunities in situ.
In what follows, we review studies on overlap resolution in video-mediated interaction (VMI), video-mediated additional language (L2) learning, and tutoring, which serve as background to the present contribution. Schegloff (2000) identified distinct phases through which overlaps are solved with reference to the onset of the overlapping talk: the pre-onset phase, post-onset phase, post-postonset phase, and post-resolution phase. During the pre-onset phase, the current speaker may react to their co-participant's incipient attempts to take the floor (for instance, through gaze, pointing, loud in-breaths) and design their utterance to prevent overlap (e.g., by speeding up the pace of talk). In the post-onset phase, speakers adapt to the fact that they are producing talk simultaneously. This adaptation includes halts in the progressivity of talk, usually referred to as "hitches" (e.g., cut-offs) or prosodically marked articulation of talk (e.g., slower tempo), labelled "perturbations" (Schegloff 2000, p. 11). Although hitches and perturbations are found in other interactional environments as well, they acquire specific functions in the context of overlap resolution, as they index that the participants have noticed a problem with progressivity that may impact mutual understanding. In the post-post-onset phase, speakers usually launch what Schegloff calls a "contest" for the floor, by speaking louder, for instance. In the post-resolution phase, the "winner" of the floor adapts back to speaking solo (Schegloff 2000, p. 44). At each beat of overlapping talk, the participants take a stance on what to do next, i.e., to produce the next beat in overlap or to stop talking.

Overlap Resolution in Video-Mediated Interaction
More recently, studies working with video-based data (Heath and Luff 1993;Seuren et al. 2021;Ruhleder and Jordan 2001) have shown how, in VMI, the resolution of overlaps involves dealing with the inherent asymmetries and incongruency in the production and reception of interactional moves. For instance, Seuren et al.'s (2021) study of videomediated consultations in the UK showed that due to delays between the production of a turn by the speaker and the perception of the same turn by the recipient, participants may perceive moments of actual talk as silence. Accordingly, the practices for the resolution of overlapping talk described above may be delayed for one or both interlocutors, and it may take several turns (and new overlaps) for participants to reach an agreement on who should remain on the floor. In a study of the use of videoconferencing to support remote teamwork, Ruhleder and Jordan (2001) found that transmission delays impacted the course of interaction but seemed to go unnoticed by the participants, who were not able to identify the source of emerging trouble (but see Olbertz-Siitonen 2015 for partially different results). In such contexts, the timely response of one speaker was perceived as late and in overlap with the continuation of another speaker, who might not have continued if the response had come in on time.
All in all, participants in video-mediated interaction do not seem to acknowledge the existence of latency effects (Heath and Luff 1993;Seuren et al. 2021); that is, they tend to orient to "two non-mutual realities" as a "shared one" (Seuren et al. 2021, p. 66). At the same time, as Heath and Luff (1993) point out, participants' conduct is affected by such incongruencies, as evidenced, for instance, by the upgrading of certain gestures, especially in cases in which their first attempts at certain interactional moves prove unsuccessful.
Against this background, this study explores turn beginnings that are produced in overlap in the context of L2 video-mediated instruction. We empirically show that in the context of our data, L2 tutoring aimed at advancing learners' speaking skills, simultaneous start-ups are resolved in a more explicit fashion in comparison to face-to-face interaction as described by Schegloff. The typical resolution trajectory and participants' moves observed by Schegloff (2000) and others (Oloff 2013;French and Local 1983), i.e., stopping talking and thus yielding the floor, or continuing speaking (e.g., louder) and thus staying on the floor, are not the norm in our data. Rather, in most of our cases, the negotiation involves dropping out of the floor in a rather upgraded fashion even in moments when the overlap is not persisting. As this practice is pervasively used by the tutor only, we claim that it reflects both participants' adaptation to the specific features of the medium (as prior VMI studies have identified) and the tutor's orientation to moments of simultaneous start-ups as locally emerging opportunities for the learner to use the language being learned. A similar claim has been made by Cancino Avila (2019), which, as far as we know, is the only CA study on overlapping talk in language instructional settings. Although his analyses took into consideration different types of overlaps and drew on face-to-face multi-party classroom data, his findings indicated that by providing learners with interactional space in moments of overlap, teachers display classroom interactional competence (Walsh 2006).

Video-Mediated L2 Learning
As pointed out by González-Lloret (2015), micro-analytical studies on video-mediated L2 learning can be categorized into two major strands: descriptive studies (e.g., Uskokovic and Talehgani-Nikazm 2022;Wigham 2017;Hampel and Stickler 2012;Rusk and Pörn 2019;Dooly and Davitova 2018;Jakonen and Jauni 2021;Balaman 2019) and developmental studies (e.g., Balaman and Doehler 2022;Doehler and Balaman 2021;Sert 2017;Balaman 2018). Descriptive studies seek to unveil the generic structures of VMI, that is, how participants manage sequential organization, turn-taking, repair, etc., in such settings. For instance, Uskokovic and Talehgani-Nikazm (2022) describe how L2 speakers use a gesture with the index finger alone or in combination with the utterance ein moment ('one moment' in English) to create space for screen-based word searches in interactions-for-learning between German L1 and L2 speakers. In turn, developmental studies seek to track the development of interactional practices over time. For example, in his study of hinting sequences within a task-based activity, Balaman (2018) shows how a learner increasingly diversifies the resources used to provide hints to her classmates about how to complete the task at hand.
A recurring theme in both strands are participants' methods for securing and maintaining progressivity. Several studies describe how participants use affordances of the medium to restore progressivity, while others focus more specifically on the emergence of practices to deal with technological constraints. Dooly and Davitova (2018), for example, document how the practices of showing smartphone screens and typing are used by a group of teenage learners to maintain progressivity in the face of communication barriers. Balaman and Doehler (2022) and Doehler and Balaman (2021) describe the development and routinization of grammatical formats (e.g., let me and let's x) used to transition into screen-based activities. Finally, in their study of Swedish-Finnish Tandem interactions, Rusk and Pörn (2019) show the local emergence of several strategies to deal with lags, such as an explicit orientation by the L1 speaker towards the L2 speaker's incoming turn.
The current paper is situated within the 'descriptive' strand. It adds to this body of work on video-mediated L2 learning by providing a detailed account of participants' interactional conduct when dealing with turn-taking-related disruptions in progressivity in the understudied context of L2 online tutoring.
Tutoring as an Instructional Activity EM/CA-inspired studies on tutoring, both in face-to-face and mediated by video, have analyzed the micro-analytic and sequential aspects of this type of instructional activity in various disciplines (see, for example, DiFelice Box (2015) and Creider (2020) on Math tutoring involving children and Bowden and Svahn's (2020) single-case analysis of videomediated Math homework support with an upper-secondary student). In the context of L2 instruction, the rather small body of micro-analytic studies on tutoring has explored, for example, the use of hand gestures to achieve intersubjectivity (Belhiah 2013), the sequential organization of openings and closings (Belhiah 2009), and how advice is negotiated and resisted (Leyland 2018;Park 2017;Waring 2005). And specifically on online L2 tutoring, an instructional set-up that has grown exponentially as a result of technological advances (Hamid et al. 2018) maximized by the limitations on co-present encounters imposed by the spread of the COVID-19 pandemic, Nguyen et al. (2022) investigated online search sequences in dyadic Skype interactions between an adult tutee and his L2 tutor. Their paper shows how the word 'corkscrew' becomes a teachable and learnable (Eskildsen and Majlesi 2018) due to participants' impossibility of using the medium's screen-sharing feature. As their analysis reveals, this constraint led to extensive epistemic negotiation between the participants, which occasioned the emergence of the word and its subsequent use by the L2 learner.
One aspect of L2 tutoring that remains largely overlooked refers to specific practices for turn-taking management, which seem to be particularly relevant in this context due to its less constrained interactional arrangement (as compared to regular classrooms). For example, Belhiah (2009) showed that students and tutors tended to carry out the tutoring business rather collaboratively, and that students' self-initiated turns were frequently welcomed by the tutors. In one of his excerpts, a student's initiating turn overlaps with the teacher's, who, similarly to what we find in our data, promptly yields the floor to the learner. The details of this negotiation for the floor, however, are not addressed in his examination. And indeed, no systematic analysis of overlaps in the context of L2 tutoring has been provided so far.
The current paper thus contributes to filling this gap in the CA L2 instruction literature by exploring overlap resolution in video-mediated encounters involving an experienced L2 instructor and an advanced adult learner engaged in what we call conversational tutoring. We use this term, which emerged from our work with the data, to refer to one-on-one tutoring sessions led by an L2 teacher that are not part of an institutionalized course program or fixed-term curriculum. Nonetheless, we consider L2 conversational tutoring to be a teaching-learning arrangement. The term 'conversational' refers both to the stated goals of the encounters (to improve students' speaking skills) as well as to their overall sequential structure, which comprises a great deal of interaction that appears 'more conversational' than typically 'instructional' (see context and Methods).
The main goal of this paper is to explore the embodied means through which overlaps are solved and what affordances such resolutions have for this type of L2 instructional activity. A secondary agenda is to consider what interactional demands engenders the typical resolution design found in our data, i.e., the tutor's use of enhanced explicitness in turn-yielding.

Context and Methods
The data analyzed in this study come from a budding corpus of video-mediated language instruction. The interactions took place through Zoom, a video teleconferencing software program that became popular during the outbreak of the COVID-19 pandemic. The participants-Mari (self-identified as female), a certified and experienced instructor who had taught L2 English for more than twenty years both in language courses and at the university level, and Ivo (self-identified as male), an adult B2-level learner-started meeting weekly in April 2020 ( Figure 1). Both participants are Brazilian and were living in Europe at the time of the data collection. Ivo's stated goal with the tutoring sessions was to improve his English speaking skills.
Languages 2022, 7, x FOR PEER REVIEW 5 of 23 engenders the typical resolution design found in our data, i.e., the tutor's use of enhanced explicitness in turn-yielding.

Context and Methods
The data analyzed in this study come from a budding corpus of video-mediated language instruction. The interactions took place through Zoom, a video teleconferencing software program that became popular during the outbreak of the COVID-19 pandemic. The participants-Mari (self-identified as female), a certified and experienced instructor who had taught L2 English for more than twenty years both in language courses and at the university level, and Ivo (self-identified as male), an adult B2-level learner-started meeting weekly in April 2020 ( Figure 1). Both participants are Brazilian and were living in Europe at the time of the data collection. Ivo's stated goal with the tutoring sessions was to improve his English speaking skills. As stated earlier, the lessons were not based on any pre-assigned course material and were designed by the tutor as mostly comprising conversational practice that was very seldom interrupted for corrections (most corrections were done after the session via WhatsApp or a shared editable file). When there were specific grammatical or lexical units pre-prepared by the tutor, they often drew on issues that emerged during the sessions or related tasks.
Broadly speaking, Ivo and Mari's encounters followed an overall sequential structure comprising several phases. These included both the general phases identified in everyday life and institutional video-mediated interactions, such as pre-opening/opening and preclosing/closing phases (Mondada 2015; Ilomäki and Ruusuvuori 2020) as well as typical language instruction phases (e.g., homework checking and task instruction giving), although these more typical 'instructional' phases were not present in some of the encounters and were held primarily on a conversational basis. The pre-opening and opening phases comprised checks from both parties regarding the quality of the audio and the video and in which the participants greeted each other. Immediately after this phase, Ivo and Mari engaged in extended howareyou sequences and updates, followed by a transition to the 'business of the day', proposed by Mari, which included, e.g., a prereading activity. During the activities proposed by the instructor, the participants often initiated sequences of talk that were arguably 'conversational', i.e., which did not specifically advance the proposed activity.
The recordings were made from Mari's computer with the written consent of both participants. For the purposes of the study presented here, the first ten recorded lessons were transcribed according to the GAT 2 conventions for English (Couper-Kuhlen and Barth-Weingarten 2011) and the Mondada system for multimodal transcription (Mondada 2019). Combined, these systems allow for detailed prosodic-phonetic and bodily-visual description 1 .
This study draws on the methodological framework of Conversation Analysis (CA) and Interactional Linguistics (IL). IL investigates linguistic resources as they are used in social interaction (Couper-Kuhlen and Selting 2018) and has been heavily influenced by CA, which seeks to unveil the nature of the orderliness observed in social interaction. Both As stated earlier, the lessons were not based on any pre-assigned course material and were designed by the tutor as mostly comprising conversational practice that was very seldom interrupted for corrections (most corrections were done after the session via WhatsApp or a shared editable file). When there were specific grammatical or lexical units pre-prepared by the tutor, they often drew on issues that emerged during the sessions or related tasks.
Broadly speaking, Ivo and Mari's encounters followed an overall sequential structure comprising several phases. These included both the general phases identified in everyday and institutional video-mediated interactions, such as pre-opening/opening and pre-closing/closing phases (Mondada 2015; Ilomäki and Ruusuvuori 2020) as well as typical language instruction phases (e.g., homework checking and task instruction giving), although these more typical 'instructional' phases were not present in some of the encounters and were held primarily on a conversational basis. The pre-opening and opening phases comprised checks from both parties regarding the quality of the audio and the video and in which the participants greeted each other. Immediately after this phase, Ivo and Mari engaged in extended howareyou sequences and updates, followed by a transition to the 'business of the day', proposed by Mari, which included, e.g., a pre-reading activity. During the activities proposed by the instructor, the participants often initiated sequences of talk that were arguably 'conversational', i.e., which did not specifically advance the proposed activity.
The recordings were made from Mari's computer with the written consent of both participants. For the purposes of the study presented here, the first ten recorded lessons were transcribed according to the GAT 2 conventions for English (Couper-Kuhlen and Barth-Weingarten 2011) and the Mondada system for multimodal transcription (Mondada 2019). Combined, these systems allow for detailed prosodic-phonetic and bodily-visual description 1 .
This study draws on the methodological framework of Conversation Analysis (CA) and Interactional Linguistics (IL). IL investigates linguistic resources as they are used in social interaction (Couper-Kuhlen and Selting 2018) and has been heavily influenced by CA, which seeks to unveil the nature of the orderliness observed in social interaction. Both fields depart from the basic assumption of "order at all points" (Sacks 1992, p. xlvii), i.e., no detail is dismissed as irrelevant to participants before careful consideration. Likewise, IL and CA approach social interaction from an emic perspective, through which context is reflexively constructed through participants' talk (Schegloff 1997) and participants' sensemaking practices to accomplish actions are reconstructed through the next-turn proof procedure.
As a CA/IL study, our analysis started with the collection and detailed transcription of candidate single instances of the focal phenomenon (see Hoey and Kendrick 2017 for collections in CA), which yielded a total of 41 clear cases of simultaneous start-ups. We considered turn beginnings that overlapped in the pre-onset and onset phase (see Section 2 above) to be clear cases of simultaneous start-ups. We excluded non-competitive overlaps, such as utterance completions and co-constructions, agreements, brief assessments, and laughter. As we only had the recordings from Mari's perspective, our analysis was based on the 'version' of the interaction uniquely available to one of the participants (cf. Olbertz-Siitonen 2015).

Results: The Multimodal Design of Simultaneous Start-Up Resolutions
The analytical part of this paper illustrates the typical simultaneous start-up resolution trajectory found in our data, which comprises Ivo "winning" the floor (Schegloff 2000) and Mari indicating her dropping out through what we call enhanced explicitness, i.e., through the combination of different linguistic and embodied resources used in what Mondada (2014) calls a complex Gestalt. One case in which the enhanced explicitness strategy is not used is presented as well, with the aim of discussing possible explanations for the distribution of the enhanced explicitness strategy in our collection.

The Lip Pressing Gesture
Excerpt 1 shows Mari's use of a lip gesture to more explicitly withdraw from the competition for the floor. It takes place at the beginning of the session. Mari and Ivo had spent a couple of minutes looking up landmarks in Mari's neighborhood, at that time unknown to Ivo, on Google Maps. Mari had told Ivo that one of the advantages of living in that particular area was the easy access to parking slots in comparison to other parts of the city. We join their conversation as Ivo affiliates with this stance. The simultaneous start-up takes place in lines 6 and 7. After Ivo's turn (lines 1-2), Mari produces a minimal token that is hearably an agreement with what Ivo just said (line 3). As Ivo gazes down and does not produce another turn (line 4), Mari explicitly formulates her agreement with "that's true" (line 5), which constitutes both a TRP and a place of possible sequence closure. This is when the simultaneous start-up takes place, with both Mari and Ivo producing and-prefaced new turns (lines 6 and 7). For her part, when Mari produces the beginning of her turn, she moves her head to the right towards the window. Although we cannot know for sure, this body movement may have contributed to the simultaneous start up, as it may have impeded the monitoring of each other's embodied turn incipiency. Right after the overlap, we hear a halt in Mari's turn while she is looking away. For his part, Ivo produces a hesitation marker, uh:, followed by a pause, and then continues speaking. Mari's next embodied action, done through a pressing lip gesture (line 7, Figure 4) beginning at the end of Ivo's hitch, confirms her withdrawal from the floor. As Ivo continues speaking, Mari keeps her lips pressed, turns to the screen, and nods.
Because of latency issues, participants in video-mediated interaction may face difficulties signaling to their co-participants that they have dropped out of the floor, which can lead to extended stretches of simultaneous talk and several attempts at overlap resolution (Seuren et al. 2021). In the example shown here, the lip pressing gesture-a more explicit resource compared to glottal or labial stops (Schegloff 2000)-allows Mari to underscore her non-speaking stance, and functions as another signal for Ivo to continue After Ivo's turn (lines 1-2), Mari produces a minimal token that is hearably an agreement with what Ivo just said (line 3). As Ivo gazes down and does not produce another turn (line 4), Mari explicitly formulates her agreement with "that's true" (line 5), which constitutes both a TRP and a place of possible sequence closure. This is when the simultaneous start-up takes place, with both Mari and Ivo producing and-prefaced new turns (lines 6 and 7). For her part, when Mari produces the beginning of her turn, she moves her head to the right towards the window. Although we cannot know for sure, this body movement may have contributed to the simultaneous start up, as it may have impeded the monitoring of each other's embodied turn incipiency. Right after the overlap, we hear a halt in Mari's turn while she is looking away. For his part, Ivo produces a hesitation marker, uh:, followed by a pause, and then continues speaking. Mari's next embodied action, done through a pressing lip gesture (line 7, Figure 4) beginning at the end of Ivo's hitch, confirms her withdrawal from the floor. As Ivo continues speaking, Mari keeps her lips pressed, turns to the screen, and nods.
Because of latency issues, participants in video-mediated interaction may face difficulties signaling to their co-participants that they have dropped out of the floor, which can lead to extended stretches of simultaneous talk and several attempts at overlap resolution (Seuren et al. 2021). In the example shown here, the lip pressing gesture-a more explicit resource compared to glottal or labial stops (Schegloff 2000)-allows Mari to underscore her non-speaking stance, and functions as another signal for Ivo to continue his turn. Excerpt 1 thus shows Mari's exploitation of the talking heads configuration (Licoppe and Morel 2012) afforded by the video-conferencing program. Furthermore, because it is non-vocal, this resource facilitates the coordination of speaker transition (cf. Seuren et al. 2021).

The Go Ahead Utterance
Excerpt 2 was extracted from Mari and Ivo's first lesson, in which the participants are engaged in a video activity. The video is about strategies that polyglots use to learn a new language. Mari's 'instructional project' (Kimura et al. 2018) seems to be to assess Ivo's understanding of this type of material as well as to discuss language learning strategies that may work for him in particular. The base sequence starts before the excerpt, as Mari pauses the video clip and asks Ivo to report on main ideas conveyed in the video and on whether he agrees with them. We join the conversation towards the end of Ivo's response. his turn. Excerpt 1 thus shows Mari's exploitation of the talking heads configuration (Licoppe and Morel 2012) afforded by the video-conferencing program. Furthermore, because it is non-vocal, this resource facilitates the coordination of speaker transition (cf. Seuren et al. 2021).

The Go Ahead Utterance
Excerpt 2 was extracted from Mari and Ivo's very first lesson, in which the participants are engaged in a video activity. The video is about strategies that polyglots use to learn a new language. Mari's 'instructional project' (Kimura et al. 2018) seems to be to assess Ivo's understanding of this type of material as well as to discuss language learning strategies that may work for him in particular. The base sequence starts before the excerpt, as Mari pauses the video clip and asks Ivo to report on main ideas conveyed in the video and on whether he agrees with them. We join the conversation towards the end of Ivo's response.
Languages 2022, 7, x FOR PEER REVIEW 10 of This excerpt exhibits two simultaneous start-ups (lines 14-15 and 17-18). While o focus is on the second one (line 17), we will begin by taking a close look at the first due its import to the emergence of the second simultaneous start-up. Between lines 1 and Ivo describes his current difficulties with learning English. Although Ivo's turn "an This excerpt exhibits two simultaneous start-ups (lines 14-15 and 17-18). While our focus is on the second one (line 17), we will begin by taking a close look at the first due to its import to the emergence of the second simultaneous start-up. Between lines 1 and 11, Ivo describes his current difficulties with learning English. Although Ivo's turn "and i can't improve in more tha more than this" (line 11) is syntactic and pragmatically complete, its mid-rising final intonation while keeping his gaze down suggests more to come. Throughout Ivo's TCU in line 11, Mari is looking down, involved with note-taking. Mari utters a mid-rising continuer, m_HM, (line 12), thereby registering Ivo's TCU and treating it as a non-turn-final one (Jefferson 1984). It is during a pause of 1.7 s (line 13), when Mari is taking notes and Ivo is looking down, that the first simultaneous start-up takes place (lines 14-15).
During this pause, had Mari been looking at her screen she might have noticed Ivo's visible in-breath (through torso movement) and mouth opening, both signs that he was about to start talking again. Ivo then utters a pre-turn-beginning u:h (line 14) (Schegloff 2010), potentially not noticed by Mari, which indicates an attempt to secure his turn space while he formulates his utterance. Mari comes in in overlap with a second continuer "m_HM", with reduced loudness and slower tempo (line 15), which does the work of explicitly displaying attentiveness to Ivo's ongoing turn while she is gazing down engaged in note taking. As Ivo does not launch a TCU right away, as projected by the u:h (note the pause in line 16), another simultaneous start-up ensues. While Ivo finally launches his TCU with so (line 18), Mari produces a click and delivers a yes with mid-falling final intonation (line 17), after which Mari's mouth is kept slightly open, indicating that Mari was going to continue speaking. This conduct seemingly indicates both that she was done note taking and her readiness to take over speakership (Jefferson 1984).
At this point, Ivo produces two uhs (line 18) and stops talking, which hearably constitute overlap-related hitches (Schegloff 2000). That Mari interprets Ivo's conduct as somewhat oriented to the overlap as "a recognizable event" (Schegloff 2000, p. 11) is supported by Mari's subsequent conduct. For one, she closes her mouth and nods (lines 18-19). Second, as Ivo does not continue immediately (see the pause in line 19), Mari utters a go ahead while shifting her body positioning and gaze direction (line 20). The verbal go ahead is followed by an apology (line 21), and a smile (line 21). In her move to make sure that Ivo remains on the floor, similarly to what she does in Excerpt 1, Mari uses the enhanced-explicitness strategy in the resolution of the simultaneous start-up. This time, such enhanced explicitness is embodied by the use of a verbal utterance, through which Mari officially allocates and secures the floor to Ivo and encourages him to continue.
Locally, two main contingencies may explain the design of her withdrawing. First, this is a second simultaneous start-up that happens right after the first one. Second, Ivo stops talking after uttering two non-lexical uh. Arguably, those two events together render this moment as one of persisting turn-taking issues, which warrants the use of even more explicit means (as compared to Excerpt 1) to resolve the overlap. As Mari turns to the screen while saying "go ahead", she can see that Ivo adopts a stand-by posture as he stops talking and stares at the screen (line 18, Figure 6). The apology "sorry" (line 21) further underscores the inappositeness of Mari's incoming and indexes that Mari holds herself accountable, as the one responsible for the smooth unfolding of the interaction, for not paying full attention to Ivo's moves. This orientation to Mari's role as interactional manager and instructor is further showcased by the Excerpt 3 below.

The Combination of the Lip Pressing Gesture and Go Ahead Utterance
Our third example comes from Mari and Ivo's first session, in April 2020, when local COVID-19 lockdown measures were slowly starting to be lifted after a stricter lockdown period. Mari and Ivo had been talking about how they felt during the lockdown. The simultaneous start-up is located in lines 7 and 8.
Languages 2022, 7, x FOR PEER REVIEW 13 of 23 Lines 1-3 show Mari's resaying of her previous comment about missing the possibility of spending time outdoors. The design of Mari's turn in line 6, along with the alternating eye gaze direction (lines 3-7), suggests a potential sequence and topic-closure (Schegloff 2007). In line 7, Mari's turn is delivered as she is gazing down (potentially at her notes), which, together with the connector and, suggests that she was indeed about to launch a new (sub-)topic. As Mari is gazing down, Ivo opens his mouth and then produces an audible in-breath, which is delivered in overlap with Mari's turn.
Mari's dropping out of the floor happens as soon as she hears Ivo's turn. She first does this by simply interrupting her talk. As a consequence, the continuation of Ivo's turn, containing a prosodically marked high-pitched yeah and presumably the first sound of the conjunction but, is heard in the clear (line 8). This turn design projects a concessive yes-but construction (Couper-Kuhlen and Thompson 2000), and thus suggests that Ivo's talk was going to propose a slightly different understanding than the one formulated by Mari in lines 5-6, while remaining on the same topic.
The halt of his turn seemingly displays Ivo's stance towards the fact that he and Mari are talking in overlap. He does not compete for the floor and seems to be uncertain about who should continue, which causes the talk to be momentarily halted. During this pause (line 9), Mari's mouth is closed and Ivo is nodding slightly. Mari then further displays her Lines 1-3 show Mari's resaying of her previous comment about missing the possibility of spending time outdoors. The design of Mari's turn in line 6, along with the alternating eye gaze direction (lines 3-7), suggests a potential sequence and topic-closure (Schegloff 2007). In line 7, Mari's turn is delivered as she is gazing down (potentially at her notes), which, together with the connector and, suggests that she was indeed about to launch a new (sub-)topic. As Mari is gazing down, Ivo opens his mouth and then produces an audible in-breath, which is delivered in overlap with Mari's turn.
Mari's dropping out of the floor happens as soon as she hears Ivo's turn. She first does this by simply interrupting her talk. As a consequence, the continuation of Ivo's turn, containing a prosodically marked high-pitched yeah and presumably the first sound of the conjunction but, is heard in the clear (line 8). This turn design projects a concessive yes-but construction (Couper-Kuhlen and Thompson 2000), and thus suggests that Ivo's talk was going to propose a slightly different understanding than the one formulated by Mari in lines 5-6, while remaining on the same topic.
The halt of his turn seemingly displays Ivo's stance towards the fact that he and Mari are talking in overlap. He does not compete for the floor and seems to be uncertain about who should continue, which causes the talk to be momentarily halted. During this pause (line 9), Mari's mouth is closed and Ivo is nodding slightly. Mari then further displays her withdrawal by pressing her lips while moving her head back to the center/front of the screen, which underscores her readiness to listen to Ivo. What follows is a yet more explicit display of Mari's withdrawal: she produces a turn containing go ahead while nodding and raising her eyebrows, then smiles (line 10). For his part, Ivo first produces a freestanding okay (Couper-Kuhlen 2021), accompanied by a subtle nod (line 11). Although okays are particularly known across languages for acknowledging receipt of information (Couper-Kuhlen 2021; Oloff 2019), the prosodic delivery (faster than surrounding talk and with mid-rising intonation) of Ivo's okay suggests that it may be doing more than merely acknowledging that Ivo has heard that Mari and he are speaking in overlap. However, in the specific context of simultaneous start-up resolution in video-mediated interaction, it is difficult to determine its function. It could be checking whether the lag was over on Mari's end, requesting confirmation from Mari that he could/should continue, or even yielding the floor to Mari. Finally, through its strategic position and design, Mari's smile does several things. First, it further signals her withdrawal from the floor, and thus from the negotiation of speakership. Second, by maintaining her smile at this point in the interaction, Mari imprints a playful stance towards her invoking of her primary rights over the interactional floor as well as the tacit norms of their encounter, which includes Mari facilitating Ivo's access to the floor.
The design of Mari's actions via a cluster of facial gestures (Kendon 2004) and other linguistic and prosodic resources in the overlap resolution in this case points to the local management of the business at hand, i.e., managing the interactional floor and maximizing learners' language use by contingently securing the floor to the student in moments of overlap. By not continuing to speak and signaling her dropping out of the floor in such a marked fashion, Mari momentarily invokes her deontic status, i.e., "the relative position of power that a participant is considered to have or not to have, irrespective of what he or she publicly claims" (Stevanovic 2018, p. 375), as instructor. Ivo's subsequent turn with another okay, this time prefaced by a change of state token (Heritage 2016) oh (line 13), is delivered with mid-falling intonation in a multimodal package of its own, with laugh particles and accompanied by an eye roll and a head movement (line 13, Figure 8), thus affiliating with the playful stance projected by Mari's embodied actions to resolve the overlap. It complies with Mari's turn-yielding move in a humorous enactment of feigned resistance.
Excerpts 1-3 showcase, in micro-analytical and sequential terms, a recurrent trajectory of simultaneous start-up resolution in our data, i.e., that the tutor orients to the tutee as the one who should continue talking after moments of simultaneous start-ups. They are also representative of how the handing over of the floor happens; it is accomplished through a rich multimodal Gestalt of resources mobilized by the tutor in a locally-sensitive way as she adjusts her conduct to the medium as well as to her student's moves. We propose that such moves, i.e., recurrently dropping out of the floor at moments of overlap and indicating it through more explicit embodied and linguistic methods, constitute an emergent interactional practice used by the teacher to manage turn-taking and yield the turn to the student. As such, we argue that this recurrent resolution trajectory reflects and constitutes part of the tutor's role of 'interactional manager' (Walsh 2006;Kasper 2004) and as the crafter of learners' emergent language use opportunities (Sert 2017).
That this practice reflects the tutor's orientation to her instructional role and the larger agenda of securing and maximizing the student's interactional space is further substantiated by an example in which the more explicit resources are not employed. We turn to this last example in Excerpt 4 below.

A Contrasting Case: The Resolution of Overlap without the Enhanced Explicitness Strategy
Excerpt 4 is a case in which the simultaneous start-up is solved without the enhanced explicitness strategy. It takes place half-way through the session. As Mari corrected Ivo's use of pronouns, they ended up talking about the adequacy of pronouns and their relation to sexist language. We join the interaction when Mari is telling Ivo about her M.A. dissertation, which she wrote in Portuguese. In line 1, Mari states that when writing her dissertation, she decided to use both the masculine and the feminine nouns to refer to groups that include both men and women, although mixed groups are commonly referred to by the use of masculine forms in Portuguese. This is what the indexical that in line 1 refers to.
to sexist language. We join the interaction when Mari is telling Ivo about her M.A dissertation, which she wrote in Portuguese. In line 1, Mari states that when writing he dissertation, she decided to use both the masculine and the feminine nouns to refer to groups that include both men and women, although mixed groups are commonly referred to by the use of masculine forms in Portuguese. This is what the indexical that in line refers to. Mari's telling about her dissertation is received with a continuer (line 5) followed by a change-of-state token (line 6). This change-of-state token (Heritage 2016), along with Ivo's raised eyebrows, displays Ivo's surprise on learning about Mari's dissertation topic, one Ivo, as he himself had stated previously, was interested in. During a pause of one and a half seconds (line 7), Mari seems to be inviting elaboration related to Ivo's display of surprise, which is indicated by her silence and nods. At the same time, Mari could be waiting for a transmission problem to be resolved, as Ivo's turn (lines 5-6) can only be heard, not seen (see Ivo's image freezing in line 4). After this silence, Ivo makes his interest on the topic of Mari's dissertation more evident through the initiation of a new sequence projected by a question (line 9). This new sequence clashes with a potential topic expansion by Mari (line 8), producing a simultaneous start up. The content of Mari's turn beginning suggests that she was going to advance the larger topic of the interaction at that point (sexism and language), instead of producing further talk on her dissertation.
As with most of the cases in our collection, what follows the overlapping talk is Mari's ongoing turn suspension. However, the design of her dropping out this time is considerably different from the previous examples. In Excerpt 4, the simultaneous startup is resolved through the deployment of the most basic resource available to speakers in talk-in-interaction, i.e., to simply stop talking. The action of dropping out of the floor and thus yielding the floor to Ivo is not highlighted by the lip gesture nor does Mari employ the utterance go ahead. It is comprised solely of stopping the production of her turn, nodding, and halting her right-hand movement and its retraction to her shoulder, where she lets her hand rest (line 9). This movement signals her withdrawal of the floor and indexes her recipiency state (Oloff 2013).
While the design of Mari's withdrawal in this occurrence (halt of talk plus the hand gesture retraction) allows Ivo to continue (as the other designs do), it does this in a way that does not impose on him in the way that a fiercely closed mouth with pressing lips or a verbal go ahead would. Seemingly, in this particular example, as well as in the other two Mari's telling about her dissertation is received with a continuer (line 5) followed by a change-of-state token (line 6). This change-of-state token (Heritage 2016), along with Ivo's raised eyebrows, displays Ivo's surprise on learning about Mari's dissertation topic, one Ivo, as he himself had stated previously, was interested in. During a pause of one and a half seconds (line 7), Mari seems to be inviting elaboration related to Ivo's display of surprise, which is indicated by her silence and nods. At the same time, Mari could be waiting for a transmission problem to be resolved, as Ivo's turn (lines 5-6) can only be heard, not seen (see Ivo's image freezing in line 4). After this silence, Ivo makes his interest on the topic of Mari's dissertation more evident through the initiation of a new sequence projected by a question (line 9). This new sequence clashes with a potential topic expansion by Mari (line 8), producing a simultaneous start up. The content of Mari's turn beginning suggests that she was going to advance the larger topic of the interaction at that point (sexism and language), instead of producing further talk on her dissertation.
As with most of the cases in our collection, what follows the overlapping talk is Mari's ongoing turn suspension. However, the design of her dropping out this time is considerably different from the previous examples. In Excerpt 4, the simultaneous start-up is resolved through the deployment of the most basic resource available to speakers in talk-in-interaction, i.e., to simply stop talking. The action of dropping out of the floor and thus yielding the floor to Ivo is not highlighted by the lip gesture nor does Mari employ the utterance go ahead. It is comprised solely of stopping the production of her turn, nodding, and halting her right-hand movement and its retraction to her shoulder, where she lets her hand rest (line 9). This movement signals her withdrawal of the floor and indexes her recipiency state (Oloff 2013).
While the design of Mari's withdrawal in this occurrence (halt of talk plus the hand gesture retraction) allows Ivo to continue (as the other designs do), it does this in a way that does not impose on him in the same manner as a fiercely closed mouth with pressing lips or a verbal go ahead would. Seemingly, in this particular example, as well as in the other two examples found so far in our data in which Mari is the one talking about personal affairs, Mari's role of interactional manager/instructor is backgrounded; that is, momentarily, Mari and Ivo are not primarily oriented to their deontically asymmetric roles, but rather to their roles as unacquainted interactional partners (Svennevig 2000) who are currently engaged in learning more about each other's personal affairs. This example thus supports the claim that the backgrounding and foregrounding of the tutor's role of fostering learner participation is constituted by and reflected in how the simultaneous start-ups are oriented to and solved.

Discussion and Implications: Overlap Resolution and the Management of Language Use/Learning Opportunities
This paper has examined the resolution trajectory of simultaneous start-ups in videomediated L2 tutoring. With a decided focus on the cases in which the tutor withdraws and yields the floor to the student, we observed the mobilization of a set of resources comprising the timely termination of talk, mouth and lip gestures, smiles, hand gestures, and the use of the expression go ahead. We have shown that these resources, locally used to more explicitly mark the tutor's withdrawal from the floor, support the student's incipient talk and reflect how participants' conduct is shaped by the medium. Furthermore, we have argued that this practice of enhanced explicitness secures the learner's interactional space in moments of overlap in VMI, and thereby indexes the tutor's role as the crafter of emergent language-use opportunities.
The present paper has implications for both VMI and for CA-SLA research. Our analysis has unveiled the more prominent function that certain resources acquire in VMI. For example, we have shown how pressing the lips-a practice that has not been reported by previous studies in relation to overlapping talk-is recurrently used in our data. Through the prominence of her lip pressing gesture after dropping out of the floor, the tutor secures her recipient mode and makes use of her rights as interactional manager to influence the course of the overlap resolution. This finding supports earlier research (e.g., Heath and Luff 1993;Licoppe and Morel 2012) in suggesting that facial gestures gain prominence over hand gestures in VMI, which is an example of how participants exploit the 'talking heads' configuration (Licoppe and Morel 2012).
As this study is limited to one dyad, more research is needed in order to determine whether the enhanced explicitness strategy is likely to be encountered in interactions involving other conversational partners in VMI. Our analyses suggest that participants' deontic status may play a role in determining who can use the enhanced explicitness strategy for overlap resolution. Further research could verify whether this holds true for other contexts and with participants with more symmetrical deontic statuses. It would also be relevant to verify whether simultaneous start-ups are resolved differently over time, as Mari and Ivo become more familiar with the medium and more acquainted with each other. Future research could also investigate which other practices are used by L2 instructors and participants in VMI more generally to solve simultaneous start-ups as well as other types of interactional hurdles.
Another avenue for future VMI studies is the investigation of how micro-sequentiality ) is accomplished in video-mediated interaction. The notion of 'micro-sequentiality' is a recent development in multimodal CA, and seems crucial to explaining participants' orientations to bodily-visual behaviors (Mondada 2014;. Cases such as the one shown in Excerpt 3 suggest that linguistic and embodied resources may be mobilized across an increasing scale of 'explicitness' (from simply stopping talking to uttering the expression 'go ahead') in order to allow participants to micro-adjust to each other's conduct. This mechanism should be further explored by multimodal conversational-analytical studies on VMI.
Our study contributes new understandings to CA-SLA research. Previous studies have claimed that language use is "the driving force for language development and language learning" (Eskildsen 2020, p. 59) and that an integral part of the specialized work that teachers do involves successfully managing learners' initiatives and creating language learning opportunities (e.g., Waring 2011; Girgin and Brandt 2020;Sert 2017). Our close analyses of simultaneous start-ups illustrate how such management of learners' self-initiated turns and crafting of language use opportunities is done in the understudied context of adult L2 tutoring with advanced learners. For one, it invites us to reconceptualize our understanding of 'language teaching'. In the context of L2 conversational tutoring, juggling the institutional roles of L2 teacher and student and that of conversational partner seems to be a pervasive concern for the participants. As the encounters are to a great extent designed to resemble everyday conversation, a calibration between being a teacher, with all the rights and obligations attached to it, and being a conversational partner, is presumably needed. Indeed, we observe that the categories of teacher and student may be brought to the surface of talk during moments when the progressivity of ongoing talk is at stake, which is the case with simultaneous start-ups. The negotiation that these events require, as the analysis of excerpts 1-3 shows, mobilize a set of practices that seem to actualize the institutional roles of teacher and student in what appear at first glance to be moments of everyday interaction with no fixed instructional agenda (e.g., a grammar point) in place.
All in all, our findings support the general claim that teachers are the "key designers" (Hall 2020, p. 12) of learning opportunities and reveal new facets of the embodied work of teaching (Hall and Looney 2019) that is mobilized in the increasingly popular videomediated L2 settings. Our findings also provide further evidence for the claim that a core principle of teacher-student interaction is adapting ongoing talk to interactional contingencies (Malabarba 2015;Waring 2016).
In line with the goal of "[m]aking visible the practices and actions of how L2 teaching is accomplished" (Hall et al. 2020, p. 36) in order to produce useful insights to L2 instructors, these findings could integrate evidence-based teacher reflection and professionalization programs (see Hall et al. 2020;Glaser et al. 2019;Sert 2021;Ekin et al. 2021) to leverage L2 instructors' e-classroom interactional competence (Moorhouse et al. 2021). Specifically, our finegrained analyses of how simultaneous start-ups are solved could help raise L2 instructors' awareness of the mechanics of dyadic L2 video-mediated instruction. Instructors who are less familiar with VMI should find these results relevant to understanding how pedagogical actions, such as leaving the floor to students in moments of overlap in video-mediated instruction, are accomplished by experienced instructors. Informed Consent Statement: Informed written consent was obtained from the tutor and the tutee involved in this study.
Data Availability Statement: Due to privacy concerns, the data are not available to other researchers.