INTRODUCTION
Fluent, accurate speech production involves many sub-processes, including formulating utterances, accessing words from memory, and planning and executing articulation (Levelt, 1989). Relatively little work has examined the cognitive processes involved in speech production and speech articulation (Bock & Griffin, 2000), at least when compared to the wealth of research on speech articulation, and on speech and language perception and processing. When speech-language pathologists work with clients who have speech sound disorders (SSD), they may focus on movements and biomechanics for speech. But, these same clients may also have cognitive and linguistic deficits affecting their communication abilities (e.g. Munson, Baylis, Krause, & Yim, 2010; Waring & Knight, 2013).
Recognizing the linguistic load of speech stimuli on speech production may be instrumental when selecting therapy materials and ultimately helping clients transition their new motor skills into real-life communication situations.
This study examines how manipulations known to affect the cognitive processes involved in planning speech production interact with jaw positioning using a bite block. Our broader motivation is to consider how motoric and linguistic effects on speech production may interact in clinical treatment protocols for children with SSDs. Indeed, there is some evidence that bite blocks might be useful clinical tools in therapy for children with a variety of SSDs. This comes from findings that children with various SSDs have difficulties in achieving independent control of the tongue and the jaw, and hence might benefit from training procedures that isolate the function of the tongue from that of the jaw (e.g., Edwards, 1992; Terband, Maasen, Van Lieshoult, & Nijland, 2011).
Numerous studies have used phonological form priming to examine various aspects of speech production (Zwitserlood, 1996 and references cited therein). Some of these studies have found facilitating effects of phonological priming (Jescheniak & Schriefers, 2001), whereas others reported inhibitory effects (Sevald & Dell, 1994). Vitevitch (2002) questioned the use of phonological form priming to study speech production, citing extensive literature critical of priming paradigms in general (e.g. Bowles & Poon, 1985). Priming paradigms lead participants to adopt experiment- and task-specific strategies when responding to experimental stimuli. Thus, data from these experiments may reflect commonalities in strategies for responding in priming paradigms, and may not provide evidence for form-related priming in normal speech production and perception.
Rather than priming, Vitevitch (2002) suggested examining speech production processes by using stimuli that vary in their lexical characteristics, and hence in their production demands. One strategy for this is to examine the production of words that vary in lexical difficulty. Lexical difficulty is often operationally defined as the combination of two factors: words’ frequency of use, and the size of their phonological neighborhood, i.e. the number of real words that can be created by adding, changing, or deleting a phoneme from a target word. Lexically difficult words are those that are from dense neighborhoods, and that are used infrequently. These words are identified more poorly than words that are high in frequency and are similar to relatively few other real words (i.e. words from sparse neighborhoods) (Pisoni & Luce, 1997).
Lexical factors also affect speech production. Words’ frequency of use affects the speed with which they are produced in picture-naming tasks (e.g. Balota & Chumbley, 1985). Words’ phonological neighborhood density affects the speed with which they are produced (Vitevitch, 2002). There is also evidence that lexical factors affect phonetic detail in speech. Formant distribution, or degree of dispersion in the F1/F2 vowel space, is a common metric for analyzing vowel production. Wright (2004) examined vowel productions as a function of lexical difficulty and found that vowels in lexically difficult words were produced with formant values closer to the periphery of the F1/F2 vowel space than those in easy words. Thus, the vowels in the lexically difficult words were more physically distinct from one another, and, presumably, more easily discriminated than those in the lexically easy words. This result is consistent with Lindblom’s (1992) H&H theory of speech production, which argues that talkers actively modify their articulation in different tasks and speaking environments to maintain an adequate level of intelligibility. Wright’s (2004) results suggest that talkers have a tacit awareness of the perceptual difficulties associated with lexically difficult words, and subtly modify their articulation to make them easier for listeners to perceive.
Munson and Solomon (2004) replicated Wright’s (2004) finding with a subset of the words from that study. They demonstrated that both components of lexical difficulty (frequency of use and phonological neighborhood density) had independent influences on vowel dispersion. Second, Munson and Solomon showed that the influence of vowel duration on vowel dispersion differences was not due to a confounding effect of lexical difficulty on vowel duration. Vowel duration is positively correlated with vowel dispersion, such that shorter vowels tend to be produced farther from the periphery of the vowel space than longer vowels (Moon & Lindblom, 1994). It has been established previously that frequency of word usage influences word duration: high-frequency words are shorter than low-frequency words (Wright, 1979), and nonwords with frequent diphone sequences are shorter than ones with infrequent or non-occurring sequences (Munson, 2001). Munson and Solomon’s results suggest that Wright’s original findings on vowel dispersion were not due to a mediating influence of vowel duration.
Work subsequent to that by Munson and Solomon (2004) has shown numerous other effects of lexical difficulty on articulation, but a full review of this literature is outside the scope of this paper. A few representative findings are as follows. Baese-Berk and Goldrick (2009) showed that voice-onset time of voiceless stops is longer in lexically difficult words than in short ones. Gahl, Yao, and Johnson (2012) demonstrated that the effect of lexical difficulty on vowel duration in connected speech is the opposite of what Wright (2004) and Munson and Solomon found in laboratory speech. Further work by Gahl (2015) provided evidence that the apparent effect of lexical difficulty on vowel production might be due to the distribution of consonants adjacent to vowels in hard and easy words and their coarticulatory effects on vowels, rather than to lexical competition. Buz and Jaeger (2016) failed to replicate Munson and Solomon’s finding that lexical competition affects vowel dispersion.
They speculated that our findings reflected the presence of minimal pairs or near-minimal pairs in the set of lexically difficult stimuli, and that hyperarticulation reflected speakers’ intentional exaggeration of differences between words in the experiment. The current study examines the combined effect lexical difficulty and jaw positioning by a bite block has on vowel dispersion. Numerous studies have demonstrated that speech produced by adults during an articulatory perturbation such as speaking with a bite block differs only minimally from speech produced in jaw-free conditions, both acoustically and kinematically (Gay, Lindblom, & Lubker, 1981; McFarland & Baum, 1995; Baum, McFarland, & Diab, 1996; Solomon, Makashay, & Munson, 2016). This compensation can occur in the absence of auditory and tactile-kinesthetic feedback (Kelso & Tuller, 1983), and reflects talkers’ ability to reorganize their articulation to achieve the task-directed goal of producing intelligible speech. We and others have demonstrated that small (2–5 mm) bite blocks have minor effects on speech, but that the impact on vowel formants and spectral characteristics of speech increases as jaw displacement increases (Lindblom & Sundberg, 1971; McFarland & Baum, 1995; Solomon et al., 2016). McFarland and Baum (1995) used a 10-mm bite block to perturb speech based on previous studies that predicted that this amount of jaw displacements would affect consonant and vowel productions while still allowing speech to be produced comfortably. Nonetheless, few differences emerged and those that did mostly affected consonants.
The goal of the current study is to examine the interactive roles of lexical competition and jaw positioning on vowel articulation in normally speaking adults. Specifically, it examines the duration and F1/F2 dispersion of vowels in lexically easy and lexically difficult words spoken with and without a 10-mm bite block. Preemptively, this study first seeks to rule out the potential effect of vowel duration on vowel- space differences associated with lexical difficulty. If expanded vowel spaces result from increased duration, then we would predict strong, consistent correlations between those measures. Second, it examines whether vowel dispersion is associated with lexical difficulty because of the frequency of word usage, or because of active articulatory reorganization aimed at maximizing their distinctiveness. If dispersion differences are due to habitual reduction of lexically easy words, then we would predict that they would occur in the jaw- free condition but not in the bite-block condition. We believe that this study overall will help speech-language pathologists understand how stimulus characteristics might interact with jaw positioning in clinical treatment of individuals with developmental and acquired speech sound disorders.
METHODOLOGY
Participants
A group of 10 adults [8 women and 2 men, ages 20;11 (yr:mo) to 38;9 (M = 26;3, SD = 6;2)] participated in this study after providing informed consent, the protocol for which was approved by the University of Minnesota IRB for Human Subjects Research. All participants were native speakers of English, passed a hearing screening at 20 dB at 500 Hz, 1, 2, and 4 kHz (ANSI, 1989), and reported no history of speech, language, or hearing disorders. These participants have been described previously (Munson & Solomon, 2004; Solomon & Munson, 2004; Solomon et al., 2016). Subjects received $10 per hour for their participation. The entire protocol, including making the bite blocks and participating in all of the speech and nonspeech tasks, took approximately 2 hours.
Materials
Stimuli
Stimuli included 30 CVC words, listed in
Table 1, according to the pronunciation common to Minnesota. Certain pronunciations (e.g. wash, vote, both, goat, moat) differed from many other dialects of English – the mid-back round vowel is a long monophthong /o:/ rather than the diphthong /ov/; both /a/ and /з/ are produced as /a/. The six vowels chosen for this study and their distribution within the lists were consistent with previously published results (Wright, 2004) and in accordance with the additional constraint of choosing words based on lexical difficulty and phonetic factors like final consonant voicing.
The words were selected from the Hoosier Mental Lexicon (Pisoni, Nusbaum, Luce, & Slowiaczek, 1985), an on-line version of Webster’s pocket dictionary. Half of these were lexically difficult, in that they have a relatively low frequency of usage and are phonologically similar to many other real words. The remaining 15 words were lexically easy, meaning they have a relatively high frequency of usage and iare phonologically similar to few other real words. All of the words in this study were highly familiar to undergraduate students (mean familiarity rating > 6.8 on a 7-point scale; Pisoni et al., 1985).
The two word lists contained equal numbers of words with the six vowels /i/, /i/, /æ/, /a/, /o:/, and /u/ and were balanced for characteristics known to affect vowel duration. In particular, the two lists did not differ significantly in the distribution of words ending in voiced obstruents, voiceless obstruents, or voiced sonorant consonants [x2(2) = 0.70, p > .05].
Bite Blocks
Bite blocks were created for each participant using the materials and procedures presented in Netsell (1985) and described in our related papers (Solomon & Munson, 2004; Solomon et al., 2016). For this study, participants used a bite block that separated the upper and lower molars by ~10 mm. The block was placed unilaterally (7 right, 3 left as described in Solomon & Munson, 2004).
Data Collection
The data reported in this paper were collected as part of a larger study examining the influence of a bite block on nonspeech (Solomon & Munson, 2004) and speech (Solomon et al., 2016) tasks, and the influence of lexical competition on speech (Experiment 1 in Munson & Solomon, 2004). The task included in this article was embedded randomly in the larger protocol, with the stipulation of at least 30 minutes for recovery after the tongue endurance tasks to avoid potential effects of muscle fatigue on speech production (Solomon, 2000).
Participants read the target words from 3 × 5 cards randomized for order for each jaw-positioning condition. The order of the jaw-positioning conditions (jaw free and 10-mm bite block) was counterbalanced across the participants. Each condition occurred during individual sessions, separated by at least one week, to minimize the chance that responses were influenced by stimulus familiarity. In addition, the word-reading task was initiated immediately after placing the 10-mm bite block to avoid accommodation to the articulatory perturbation (McFarland & Baum, 1995). Speech was recorded in a quiet room via a head-mounted microphone (AKG-C420 with Rolls phantom power source) onto a Roland VS-890 digital workstation (sampling rate = 44.1 kHz, 16-bit quantization) and subsequently transferred to a computer workstation.
Measurement
Speech acoustic analysis was conducted with signal-processing software (Praat v. 4.0.7, Boersma, 2001). Tokens produced with a disfluency or with extraneous noise were excluded, resulting in 111 to the full set of 120 tokens per talker (mean = 117). Missing tokens were distributed evenly among lexically difficult and easy words, and between the two jaw-positioning conditions.
Determination and reliability of vowel-duration measurements were described in detail previously (Munson & Solomon, 2004). Generally, vowel onset was defined as the beginning of periodicity in the acoustic waveform or a clear formant structure in the spectrogram. Vowel offset was taken as discontinuity in the amplitude or periodicity of the waveform or the end of a formant structure in the spectrogram. For F1/F2 dispersion, F1 and F2 were measured automatically at vowel midpoint using an LPC formant-tracking algorithm in Praat and hand-checked for accuracy as needed. Formant values were then converted to Bark values prior to computing F1/F2 dispersion (Zwicker & Turnhardt, 1980).
Dispersion in the F1/F2 space followed the method of Bradlow, Toretta, and Pisoni (1996). For each participant, F1/F2 dispersion was calculated separately for difficult and easy words produced in the jaw-free and bite-block conditions.
RESULTS
A two-factor (lexical competition x jaw positioning) within-subjects ANOVA on average vowel duration showed a significant effect of lexical difficulty [
F(1,9) = 12.13,
p = .007], but no effect of jaw positioning [
F(1,9) = 0.217,
p = .652] and no interaction [
F(1,9) = 1.522, p = .249].
Figure 1 shows vowel duration across the two conditions.
As this Figure shows, the vowels in lexically easy words were longer than those in lexically difficult words across both jaw-positioning conditions. Moreover, there was more variation across listeners in the durations of the lexically easy vowels in the jaw-free condition than for the other three conditions.
A two-factor (lexical competition x jaw positioning) within-subjects ANOVA on vowel dispersion showed a significant effect of lexical difficulty [
F(1,9) = 33.69,
p < .001], but no effect of jaw positioning [
F(1,9) = 1.590,
p = .239] and no interaction [
F(1,9) = 2.648,
p = .138].
Figure 2 shows the vowel dispersion across the two conditions. As this figure shows, the vowel dispersion was greater in the lexically hard words across the two jaw-positioning conditions.
Figure 3 plots the vowels in the F1/F2 space, separation by jaw position and lexical difficulty. Neither the F1 nor the F2 varied systematically as a function of jaw fixation, lexical difficulty, or their interaction [F1: for lexical difficulty,
F(1,9) = 0.064,
p = .806; for jaw fixation,
F(1,9) = 0.469,
p = .511; interaction,
F(1,9) = 0.632,
p = .447; F2: for lexical difficulty,
F(1,9) = 0.218,
p = .652; for jaw fixation,
F(1,9) = 0.810,
p = .391; interaction,
F(1,9) = 1.209,
p = .130].
When data were collapsed across conditions, the correlation between vowel duration and F1/F2 dispersion was significant, (r = -.331, p < .05). The direction of the relationship was for words with the shortest vowels to be produced with the most dispersion. This is inconsistent with the notion that differences in vowel dispersion across levels of lexical difficulty were due to differences in the duration of the vowels. Moreover, when data were examined within each condition, correlation coefficients did not meet criterion for significance. This was true regardless of whether a Pearson or Spearman correlation coefficient was examined.
DISCUSSION
This study examined the compounding effects of lexical competition and articulatory perturbation on the speech of normal young adults. It confirmed that CVC words that are infrequent in the lexicon and have many phonological neighbors (i.e. lexically difficult) were produced with more disperse vowel spaces than those that are frequent and have relatively few phonological neighbors (i.e. lexically easy).
Figure 3.
Scatterplot showing the relationship between vowel-space dispersion and average duration, separated by jaw-positioning condition and lexical difficulty. Open symbols are lexically easy words, filled symbols are lexically difficult words. Jaw-free productions are circles and bite-block productions are squares.
Figure 3.
Scatterplot showing the relationship between vowel-space dispersion and average duration, separated by jaw-positioning condition and lexical difficulty. Open symbols are lexically easy words, filled symbols are lexically difficult words. Jaw-free productions are circles and bite-block productions are squares.
An unexpected finding in this study was that the vowels in lexically difficult words were produced with shorter durations than those in lexically easy words. Although this finding may appear to contradict previous research on the influence of word frequency on duration (Munson, 2001; Wright, 1979), it is not incompatible with it. One inconsistency in the literature is that these studies examined words that differed in written word frequency, whereas the current study examined words that differed both in frequency and in neighborhood density. Based on findings by Vitevitch (2002), one could also hypothesize that the difference in duration between the easy trade-off between the latency with which a word is uttered and its duration. Future research should examine the relative contribution of frequency and neighborhood density to the duration of different parts of words. Nonetheless, this unexpected result dispelled concerns that lexical difficulty was related to vowel duration for this study.
Furthermore, lexical difficulty did not appear to be due to the articulatory reduction of easy words, as differences in dispersion were more marked when participants spoke with a 10-mm bite block. In that condition, talkers were forced to reorganize their articulation to compensate for the bite block, and thus could not use a habitual, reduced articulation for the lexically easy words. Taken together, these results support Wright’s (2004) hypothesis that the differences between lexically easy and lexically difficult words reflect either conscious or unconscious modifications aimed at maximizing the distinctiveness of vowels in lexically difficult words.
One criticism of this work that has been raised recently is the extent to which the hyperarticulation of lexically difficult words reflects speakers’ responses to the lexical contrasts in the set of stimuli used in the experiment itself. Recall that Buz and Jaeger (2016) were not able to replicate Munson and Solomon’s (2004) finding. They speculated that the hyperarticulation noted by Wright and by Munson and Solomon were because of the existence of near-minimal pairs in the stimulus set. The existence of minimal pairs or near- minimal pairs might have prompted the participants to hyperarticulate their productions to differentiate them from other stimuli in the experiment, perhaps including words that had been recently uttered by the participants. This possibility is intriguing, but evaluating it is outside the scope of this paper. However, it is noteworthy to point out that the use of a small set of stimuli that constitute minimal or near- minimal pairs is common in speech therapy. Hence, the current result provides useful information about the types of production differences that might emerge during speech and language therapy.
Figure 4.
Average F1 and F2 values for the six vowels in the four conditions. Open symbols are lexically easy words, filled symbols are lexically difficult words. Jaw- free productions are circles and bite-block productions are squares. Data were transformed to z-scores by individuals to facilitate comparison across talkers with different mean formant frequencies overall.
Figure 4.
Average F1 and F2 values for the six vowels in the four conditions. Open symbols are lexically easy words, filled symbols are lexically difficult words. Jaw- free productions are circles and bite-block productions are squares. Data were transformed to z-scores by individuals to facilitate comparison across talkers with different mean formant frequencies overall.