The asymmetric effect of local context on word duration: Consequences for models of production

William D. Raymond1, Michelle Gregory2, Daniel Jurafsky2, and Alan Bell2
raymond@ling.ohio-state.edu, gregorml@ucsu.colorado.edu, jurafsky@colorado.edu
abell@psych.colorado.edu
1 Dept. of Linguistics, the Ohio State University
2 Dept. of Linguistics, University of Colorado

The important role of following local context on word duration in production is well established. Word durations are longer for words that occur in phrase-final position (Klatt 1975, inter alia), pre-pausally (Bell et al. 2001; Jurafsky et al. 1998), before repetitions (Shriberg 1999) and other dysfluencies (Fox Tree & Clark 1997), or for words that are highly predictable from following words (Jurafsky et al. 2000). However, little is known about the effects of these local factors on duration when they occur in the context immediately preceding a word. This study examines the effects of three local factors in the previous and following contexts of 17000 tokens of phonetically transcribed content words from the Switchboard corpus of spoken American English. Two of the factors examined are known to affect duration in following context: utterance boundary and dysfluency. The third factor, the predictability of adjacent words as measured by their frequency and conditional probabilities, has not previously been examined in either preceding or following contexts. Determing whether these factors have prospective or retrospective effects on word duration can give us key insights into the role of context in lexical production timing.

After controlling factors affecting duration, multiple regression analyses show asymmetrical effects of all three factors examined on word duration in preceding and following contexts. Utterance final position is a significant predictor of lengthening (p<.001), but utterance initial effects on word duration are very small. Duration is significantly increased by higher frequency following words (p<.001) and by following words that are more predictable from their neighbors (p<.001), but the preceding word's probabilities do not show robust effects. Finally, following silent and filled pauses significantly predict increased word duration (p<.001) (cf. also Jurafsky et al. 1998; Bell et all 2001 for function words), while preceding dysfluencies do not. To these results may be added the previous finding that predictability from following context is a stronger predictor of duration than preceding context (Jurafsky et al. 2001).

The consistent asymmetry of four factors (position in utterance, neighboring dysfluency, probability of the target word, and probability of neighboring words) suggest that these factors have a common origin. Current theories claim that relative timing is mediated by a hierarchical prosodic structure, and prosodic constituency may largely predict final lengthening (Ferreira 1994; Selkirk 1984). There is evidence that segmental content may also affect final lengthening (Meyer 1995). Our results suggest that, in addition, probabilistic factors of various kinds must play a role in determining the relative timing of speech events at low levels in the production architecture. Perhaps more significantly, models of production must take into account that timing adjustments are largely anticipatory. This suggests that timing is a dynamic process that continues to occur late in production, affecting both encoding and articulation.


AMLaP Conference, Saarbrücken, September 2001