4.1. General Results
In order to answer the first research question, the comparison present–progressive stimuli are set aside, and the analysis focuses exclusively on the control stimuli. The control stimulus ratings provided during the AJT were averaged across all participants. The results show that the group as a whole exhibited the expected grammatical distinctions based on switch type. The acceptable control stimuli were rated nearer the acceptable end of the scale (M =
= 1.76), whereas the unacceptable control stimuli were rated in the bottom half of the acceptability scale (M =
= 2.30), as shown in Figure 1
Interestingly, not only did participants differentiate between the acceptable and unacceptable control stimuli, but there was also uniformity within the two broad groups of control stimuli. The three types of acceptable control stimuli patterned almost identically to each other, within about a quarter of a point; as did both types of unacceptable control stimuli, within about a third of a point.
A one-way ANOVA revealed there was a significant difference based on switch type (F(4,595) = 43.622, p < 0.001), confirming that the acceptable control stimuli were rated higher than the unacceptable control stimuli. Importantly, the Tukey post hoc test revealed no significant difference among the acceptable switch types (p > 0.05) nor between the unacceptable switch types (p = 0.590). Furthermore, an item analysis revealed no significant differences among the lexicalizations within each specific switch type (p > 0.05).
Returning to the first research question, these results show that there does not seem to be any variation between the different types of control stimuli included in this particular CS study. Participants consistently rated complex-sentence, subject–predicate, and direct-object switches as acceptable, while pronoun and present–perfect switches received unacceptable ratings that were parallel to each other. In the subsequent analyses, the switch types will be collapsed into their two broader categories of acceptable and unacceptable switches.
4.2. Heterogeneity Results
To answer the second research question, mean ratings for the control stimuli were calculated for each individual participant. There was a high degree of variability among the participants, as shown in Figure 2
. Descriptively, though, almost everyone (Participants 1 through 18) was united in rating the acceptable control stimuli higher than the unacceptable control stimuli. However, within this group, there were those who differentiated the two at the extreme ends of the scale, as well as others who exhibited mean ratings with only a marginal distinction. The difference in mean ratings for these individuals ranges from almost 5 points (i.e., 79.2% of the scale) to less than a quarter of a point (i.e., 3.2% of the scale). Finally, there were two individuals (Participants 19 and 20) who rated the unacceptable control stimuli higher than the acceptable control stimuli, but they did so minimally, within half a point (i.e., 7.4% of the scale).
A two-way ANOVA revealed that there was a statistically significant interaction between participant and stimulus type, F (19,560) = 6.781, p < 0.001. Simple main effects analysis showed that Participants 1 through 14 rated the acceptable control stimuli significantly higher than the unacceptable control stimuli (p < 0.05), whereas Participants 15 through 20 rated them equally (p > 0.05). In the subsequent analyses, these groups will be referred to as the Distinction Group and the No Distinction Group, respectively.
Returning to the second research question, these results show that control stimuli can be used to help sort out heterogeneity among CS intuitions. The findings were able to identify participants who did not rate the switch types according to their expected acceptability. Using the control stimuli, participants were able to be grouped based on whether or not they found a categorical distinction.
4.3. Comparison Results
To answer the final research question, the comparison present–progressive stimuli were included in the analysis. First, we can again look at the mean ratings for the participants as a whole. Across the board, the participants rated the present–progressive switches in between the two control stimuli types (M =
4.40, SD =
2.30). When these results are separated out into the Distinction Group and the No Distinction Group, the mean average for the present–progressive switches does not change much, but the relative acceptability to the control stimuli does, as shown in Figure 3
. Across the board, the No Distinction Group seems to be rating all three stimulus types equally (M =
4.86, SD =
2.16), whereas with the Distinction Group there seems to be a clear hierarchy, with the acceptable switches at the top (M =
5.99, SD =
1.52), unacceptable switches at the bottom (M =
3.04, SD =
2.16), and the present–progressive switches occupying a middle tier (M =
4.37, SD =
A two-way ANOVA revealed that there was a statistically significant interaction between participant group and stimulus type, F (2,714) = 25.819, p < 0.001. Simple main effects analysis showed that the Distinction Group rated each stimulus type differently, with the acceptable stimuli significantly higher than both other types (p < 0.001) and the present–progressive stimuli significantly higher than the unacceptable stimuli (p < 0.001). The No Distinction Group, on the other hand, showed no significant differences of any kind, rating all three stimulus types equally (p > 0.05). In other words, the Distinction Group exhibited the expected pattern, with the comparison structure falling in the middle the continuous acceptability spectrum, whereas the No Distinction Group showed no such pattern. Finally, an item analysis revealed a significant difference within the lexicalizations for the present–progressive switch type. Specifically, one item in particular received lower scores (M = 2.75, SD = 1.97) than half of the lexicalizations of that type (p < 0.05): Sus amigas están going shopping with their mothers this weekend ‘His/her/their friends are going shopping with their mothers this weekend’. This reduction is likely an artifact of having the switch occur with the English phrasal verb go shopping, whereas the remaining lexicalizations include non-phrasal verbs. Interestingly, with that item removed, the mean average for the present–progressive switches for the Distinction Group would increase slightly (M = 4.50, SD = 2.32), whereas for the No Distinction Group it would decrease (M = 4.37, SD = 2.32). Given this item was not a clear outlier in that it did not perform significantly different than two of the other items, it was not removed from the dataset.
As done previously with the control stimuli, we can also look at the comparison results by individual participant. Again, there seems to be quite a bit of variation, as seen in Figure 4
. One trend that carries over from looking at the results by participant group is that the ratings in the No Distinction Group are almost all identical regardless of the specific stimulus type. Although some individuals in this group rated the stimuli nearer the top of the scale (Participants 15 through 18), and others nearer the bottom (Participants 19 and 20), no participant in the No Distinction Group exhibits any clear difference regarding the mean ratings for the acceptable, unacceptable, and present–progressive switches. Within the Distinction Group, however, there is more variety. One commonality is that all but two of the participants (Participants 9 and 14) in this group rated the present–progressive switches in between the acceptable and unacceptable switches. For these participants, though, the difference in mean ratings for the present–progressive switches and the control switches ranges from 3 points (i.e., 50.0% of the scale) to 0.6 of a point (i.e., 0.01% of the scale) in either direction (compared to the acceptable or unacceptable switches). In other words, the proximity of the present–progressive switches’ mean rating to that of either control stimulus type varies substantially from individual to individual. For Participants 9 and 14, although the present–progressive ratings were rated lower than the unacceptable switches, they were within about half a point (i.e., 11.2% of the scale).
Isolating just the Distinction Group, a two-way ANOVA revealed that there was a statistically significant interaction between participant and stimulus type, F (26,462) = 2.200, p = 0.001. Simple main effects analysis showed that the participants can be divided into four different groups. First, Participants 1, 6 and 9 were the three individuals who rated the present–progressive switches significantly lower than the acceptable switches (p < 0.05) and equivalent to the unacceptable switches (p > 0.05). This group will be referred to henceforth as the No-Switch PrsProg Group, as their ratings of the present–progressive switch can be considered as unacceptable. The opposite is true for Participants 3 through 5 and Participant 8, who rated the present–progressive switches significantly higher than the unacceptable switches (p < 0.05) and equivalent to the acceptable switches (p > 0.05). This group will be referred to as the Switch PrsProg Group, as they considered the present–progressive switch acceptable. The other half of the participants’ mean ratings for the comparison stimuli are less clearly defined in terms of (un)acceptability. Participant 2 was the one individual whose mean ratings for the present–progressive switches occupied a true middle tier, as the ratings were both significantly higher than the unacceptable switches (p < 0.05) and significantly lower than the acceptable switches (p < 0.05). Participant 7 and Participants 10 through 14 also had mean ratings for the present–progressive switches in a middle spot, but in the sense that they were not significantly different than either of the control stimulus types (p > 0.05). These individuals will collectively be referred to as the Questionable PrsProg Group, as in both cases it is unclear if they considered the present–progressive switch truly acceptable or not.
These results show that the heterogeneity of the participant responses regarding present–progressive switches can be accounted for using the control stimuli as a comparison. Specifically, we can say that half of the participants exhibited a clearly defined (un)acceptability regarding present–progressive switches, with about half of that group considering them either acceptable or unacceptable. The remaining half of participants showed present–progressive switches as neither completely acceptable nor unacceptable.
The first crucial finding of this study is that the specific stimulus types for the acceptable control stimuli were consistently rated, as were the unacceptable control stimuli. The specific type of switch involved was not relevant, as participants found the commonly cited acceptable switches (complex-sentence switches, subject–predicate switches, and direct-object switches) as equal, and they did so as well with the commonly cited unacceptable switches (pronoun switches and present–perfect switches). This finding should not be taken for granted within experimental CS research. First, it confirms previous literature that has stated that such switches are (un)grammatical for bilingual speakers. More importantly, though, as mentioned before bilinguals are often uncertain and/or inconsistent when rating code-switched sentences. However, the fact that participants were able to use the acceptability scale in a consistent manner for these switch types helps validate that reliable results can be gained from an AJT. Moreover, these results show that the particular type of canonical grammatical or ungrammatical switch does not have an effect; participants considered them as two broad types, following the categorical nature of grammaticality proposed by syntactic theory. Of course, the current study is limited in that it only included five types of control stimuli. It remains to be seen whether other types of commonly cited switches or switch restrictions would pattern identically. Future research could include more varied grammatical switches, such as adjuncts like adverbial phrases or prepositional phrases, or ungrammatical switches, such as clitics and negation.
Another important outcome of this study is that it presented a systematic way of isolating heterogeneity among the participants. Recall that a subsection of the participations, the No Distinction Group, rated all of the control stimuli as equal. This result could mean one of two things. The first option is that the I-languages of these individuals in fact includes no grammatical distinction between such switches. It is possible that, for example, these bilingual speakers find a switch at the clausal boundary of a complex-sentence just as acceptable as a switch at the point between the verb have/haber and its past participle. Analyses were conducted regarding the language profiles of the participants in order to determine whether there is something about the linguistic backgrounds of the No Distinction Group that would result in such a difference in I-languages. No significant differences were found between the two participant groups with regard to language exposure, self-rated proficiency, age of acquisition of Spanish, or score on the English proficiency measure. However, the No Distinction Group did score significantly lower on the proficiency measure in Spanish, t(18) = 3.423, p = 0.003, with these participants all being in the intermediate range, compared to the Distinction Group being a mixture of intermediate and advanced. Also, the No Distinction Group started acquiring English significantly earlier, t(18) = 2.658, p = 0.016, as they all did so at birth, whereas the Distinction Group was once again more mixed in this regard. These results underscore the potential impact of proficiency and age of acquisition with regard to CS behavior, and it is possible that these factors are having an impact on the results. Future research could aim to better control for these two specific variables.
A second option, though, is that the participants in the No Distinction Group (or perhaps a subset of them) did not appropriately complete the task. It is impossible to speculate how or why the AJT might have failed to tap into their linguistic competences. However, as an illustrative example, it is not unreasonable to consider Participant 19 and 20′s low rating of complex-sentence switches (M
= 2.50, SD
= 1.45) as emblematic of some extra-grammatical influence. Recall that such switches are equivalent to inter-sentential CS (i.e., switching at the sentence boundary), so the fact that these two participants did not rate even these switches closer to the acceptable end of the scale could be the result a general depression of ratings across the board. This reduction could be due to a stigmatized bias against CS; however, it is at this point pure conjecture, as no related data was collected from participants to say for sure.8
Importantly, through the use of the control stimuli, these individuals can be justifiably removed or isolated from the rest of the dataset. Moreover, such a decision can be made completely independently of the target stimuli/phenomena under analysis.
The identification of heterogeneity also played a key role when using control stimuli to establish a baseline comparison of (un)acceptability. Had the experiment been designed using comparison stimuli that directly modeled the target present–progressive stimuli, they could have looked like the sentence in (5):
| || ||are||seeing||many||movies||this||month|
| ||‘Her colleagues are seeing many movies this month.’|
The sentence in five is comparable in that it uses almost the exact same lexical material but changes the auxiliary verb from English to Spanish (i.e., están
‘are’), which creates an acceptable subject–predicate switch. However, recall that the No Distinction Group rated all stimulus types the same, including other (non-lexically equivalent) subject–predicate switches. Without the acceptable and unacceptable control stimuli, the findings for these participants would have suggested that they consider present–progressive switches as acceptable as subject–predicate switches, which as discussed above, is the less likely option compared to assuming such individuals did not accurately complete the task. Regardless of which of the two options is true, though, it is with the help of the unrelated control stimuli that such individuals are able to be isolated from the rest of the data set, which would be a necessary action for either scenario.
As for the Distinction Group, the control stimuli were also effective at establishing a baseline comparison on (un)acceptability for the present–progressive switches. Again, if we were only to include the comparison sentence in (5), interpretation of the data would be quite different. If that were the case, the comparison being made would be solely between the present–progressive switches and the acceptable switches (as there would be no unacceptable switch comparison). The previously identified members of the No-Switch PrsProg Group would not change, having rated the present–progressive switches significantly lower than the acceptable switches. Nor would the members of the the Switch PrsProg Group change, for the opposite reason. However, the Questionable PrsProg Group would no longer exist. Participant 2 would be lumped in with the No-Switch PrsProg Group, as their unacceptable ratings would not be able to define a middle tier for the acceptability of present–progressive switches. The remaining participants would join the Switch PrsProg Group, as their unacceptable ratings would no longer be indistinguishable from their present–progressive switches. This shifting of groups results in literally half of the participants in the Distinction Group changing affiliation. Although the exact status for why these participants exhibited such variability with the present–progressive switches is beyond the scope of this particular study, it is undoubtable that the interpretation of the results and the subsequent analyses that would be derived from the data in each scenario would likely drastically differ. For example, at least at the broadest level, by including the acceptable and unacceptable control stimuli, the results suggest that for many of these bilinguals, the grammaticality/acceptability of present–progressive switches seems nebulous and worthy of more fine-tuned investigation. With just the direct comparison stimuli (of just subject–predicate switches), a researcher could plausibly interpret that there are merely two groups of individuals, those who accept present–progressive switches and those who do not.
Playing devil’s advocate, the inclusion of the acceptable and unacceptable control stimuli does create more complication than using just a direct comparison. In this case, it created a middle tier of individuals whose acceptability of present–progressive switches is hazy. One could argue that this outcome is not ideal, as it becomes unclear how to provide an analysis if we indeed assume acceptability is categorical. How should one proceed in such instances? First, such a middle tier of acceptability does not constitute evidence against its categorical nature. I would argue that such a result merely highlights the need for more fine-tuned examination of the structure in question. Here we only investigated a small sample of switching a present–progressive construction between Spanish and English. It is likely that with a broader set of stimuli that include more variables (e.g., controlling for the potential issue with phrasal verbs), a follow-up study could provide more clearly delineated bilateral results.
It is crucial to point out that, although this paper aims to accommodate heterogeneity, the mixture of participants included can still be considered a limitation. Although various factors were controlled for, including age of acquisition and proficiency, the bilinguals included here come from different linguistic backgrounds and distinct speech communities, which may be having an effect on the results. Future research could repeat this study in a more homogenous group. Nevertheless, as mentioned in the methods, the fact that the monolingual versions of the stimuli tested here showed no variation among the participants is a promising indication that their intuitions regarding the structures being tested here are relatively homogenous in each of the languages. However, as mentioned earlier, age of acquisition and proficiency could be playing a role with regard to CS patterns. A related limitation concerns the statistical analyses used in this study, as they are ANOVAs instead of mixed-effects models. Although not employed here, the latter are an additional way to address differences across individuals, which would further aid in addressing the heterogeneity discussed throughout this paper.
To conclude, I would like to recommend a general procedure for including control stimuli in experimental CS research. First, regardless of the syntactic structure under investigation, it is essential to include at least some form of control stimuli in the AJT. If direct comparison control stimuli can be created (e.g., by merely moving the switch point slightly), they can and should be part of the dataset. However, because such comparisons can only ever provide relative acceptability with no indication of unacceptability, unrelated control stimuli should also be included. I recommend including all the types of control stimuli tested here: complex-sentence switches, subject–predicate switches, direct-object switches, pronoun switches, and present–perfect switches.9
The advantage of using all types is that it creates more variety for the set of stimuli, thus aiding in distracting the participant from deciphering what the target structure is. As for the quantity, the number should be equivalent to each set of target stimuli under investigation. For example, a study could be designed using one quarter target stimuli, one quarter direct-comparison control stimuli, one quarter unrelated control stimuli, and one quarter additional filler stimuli. Or, if no direct comparison can be made of the target, then the sets can be divided into thirds. Once the data has been collected from the AJT, participants should be grouped based on whether they find the expected distinction between the unrelated control stimuli. The individuals who find no distinction between the acceptable and unacceptable control stimuli should be either isolated or removed from the target data analysis. Within said analysis, the acceptable and unacceptable stimuli should be used as a baseline to compare whether the target stimuli (and comparison control stimuli) with regard to acceptability. By following these steps, experimental CS research can more effectively remedy methodological concerns, specifically that of heterogeneity among participants and the operationalization of (un)acceptability.