We report on a study assessing the relative salience of the two discourse entities evoked by possessive NPs. Our analysis of salience and local coherence is cast in terms of Centering Theory (Grosz, Joshi and Weinstein 1995; Walker et al, 1998), which claims that discourse entities in an utterance are RANKED, their ranking reflecting salience. The most widely accepted hypothesis about salience in centering is that ranking is primarily affected by grammatical function: subjects are ranked more highly than objects, and these than the rest. This claim is supported both by psychological evidence (Hudson et al, 1986; Gordon et al, 1993; Brennan, 1995), and by corpus evidence (Poesio et al, 2000). However, the relative ranking of discourse entities introduced by NPs that do not occur as arguments or adjuncts of verbs - e.g., that are introduced in possessive NPs - is an open question. Walker and Prince (1996) proposed the COMPLEX NP assumption (WPH): the `leftmost' NP is most salient. (E.g., in an NP such as 'John's aunt', the possessor ('John') is most salient.) This hypothesis was not tested by W&P, and was challenged by experiments by Gordon and colleagues (1999), showing that the entity possessed (the aunt) was most salient (henceforth, G+H).
We compared the two hypotheses using the corpus annotated by Poesio et al (2000) (which includes about 4,000 NPs, of which 250 possessives) and the British National Corpus. One evaluation involved measuring the number of violations of the three main principles of Centering Theory (Poesio et al, 2000). This first test revealed that WPH leads to significantly fewer violations than G+H. The second evaluation involved measuring how well each ranking predicted subsequent anaphoric reference (Arnold, 1998). In about two thirds of the cases both possessor and possessee were subsequently referred to; but when only one discourse entity was, WPH scored significantly better than G+H - the possessor was solely referred to in 90% of the cases.
A likely explanation of the discrepancy between these results and Gordon et al's is that G+ only used in their materials possessive NPs in which BOTH entities were animate, such as 'John's aunt'. Our frequency counts with the BNC (using Wordnet to infer the animacy of an NP) suggest that in 3/4 of possessive NPs (more than 4000 instances randomly selected) the possessor is animate but the possessee isn't. This suggests that the actual ranking function could be similar to the one proposed by Di Eugenio (1998) (and also untested): in a possessive NP an animate possessee is always ranked first; should both referents be animate (as in 'John's aunt') the referent of the whole NP (the aunt) is ranked more highly (DEH). We found that DEH performs almost as well as WPH, and gets most of the cases that G+H would predict; and that in most of the cases where WPH is a better predictor of ranking, the possessor is a pronoun. We revised therefore Di Eugenio's hypothesis by adding the condition that the possessor is more highly ranked if pronominalized (PNH). PNH performs significantly better than all other hypotheses under both metrics, and correctly predicts subsequent reference in all but 2 cases (out of 300).
These results might be considered yet another confirmation of previous results
suggesting that animate entities are more prominent than inanimate ones (Bock,
1982; Byrne and Davidson, 1995; Fraurud and Dahl, 1996; Prat-Sala and Branigan,
1999); it is interesting to note however that none of the ranking functions
proposed in Centering takes animacy into account. More in general, this work
supports the view that analyses of linguistic use via corpora can usefully
complement psychological evidence about language processing, ensuring that all
relevant factors are considered in the experimental materials.