The important role of following local context on word duration in production is well established. Word durations are longer for words that occur in phrase-final position (Klatt 1975, inter alia), pre-pausally (Bell et al. 2001; Jurafsky et al. 1998), before repetitions (Shriberg 1999) and other dysfluencies (Fox Tree & Clark 1997), or for words that are highly predictable from following words (Jurafsky et al. 2000). However, little is known about the effects of these local factors on duration when they occur in the context immediately preceding a word. This study examines the effects of three local factors in the previous and following contexts of 17000 tokens of phonetically transcribed content words from the Switchboard corpus of spoken American English. Two of the factors examined are known to affect duration in following context: utterance boundary and dysfluency. The third factor, the predictability of adjacent words as measured by their frequency and conditional probabilities, has not previously been examined in either preceding or following contexts. Determing whether these factors have prospective or retrospective effects on word duration can give us key insights into the role of context in lexical production timing.
After controlling factors affecting duration, multiple regression analyses show asymmetrical effects of all three factors examined on word duration in preceding and following contexts. Utterance final position is a significant predictor of lengthening (p<.001), but utterance initial effects on word duration are very small. Duration is significantly increased by higher frequency following words (p<.001) and by following words that are more predictable from their neighbors (p<.001), but the preceding word's probabilities do not show robust effects. Finally, following silent and filled pauses significantly predict increased word duration (p<.001) (cf. also Jurafsky et al. 1998; Bell et all 2001 for function words), while preceding dysfluencies do not. To these results may be added the previous finding that predictability from following context is a stronger predictor of duration than preceding context (Jurafsky et al. 2001).
The consistent asymmetry of four factors (position in utterance,
neighboring dysfluency, probability of the target word, and
probability of neighboring words) suggest that these factors have a
common origin. Current theories claim that relative timing is mediated
by a hierarchical prosodic structure, and prosodic constituency may
largely predict final lengthening (Ferreira 1994; Selkirk 1984). There
is evidence that segmental content may also affect final lengthening
(Meyer 1995). Our results suggest that, in addition, probabilistic
factors of various kinds must play a role in determining the relative
timing of speech events at low levels in the production
architecture. Perhaps more significantly, models of production must
take into account that timing adjustments are largely
anticipatory. This suggests that timing is a dynamic process that
continues to occur late in production, affecting both encoding and
articulation.