Unsupervised Learning of Verb Classes from Lexical Statistics

Paola Merlo1 and Suzanne Stevenson2
merlo@lettres.unige.ch, suzanne@cs.toronto.edu
1 University of Geneva
2 University of Toronto

Many current models of sentence processing postulate the on-line use of rich syntactic and semantic information about verbs (e.g., MacDonald et al. 1993, Trueswell 1996, Garnsey et al. 1994), raising important questions concerning how such information is acquired. One proposal, the Syntactic Bootstrapping Hypothesis, suggests an iterative process in which syntactic and semantic knowledge about nouns and verbs build on each other over exposure to the linguistic environment. A central claim is that the acquisition of a verb's meaning is constrained by the verb's linguistics contexts: its subcategorisation frames (Gleitman 1990) and its argument structure (Gillette et al. 1999). However, it is left unspecified how the learner acquires this information.

Earlier computational work has demonstrated the induction of subcategorisation information from simple corpus-based statistics (Brent 1993). But the acquisition of argument structure has depended on fairly rich semantic information (e.g., Allen 1997). In the spirit of the Syntactic Bootstrapping Hypothesis - i.e., that simple syntactic knowledge can be informative in lexical acquisition - we show here that some important argument structure distinctions can also be learned from simple statistics that are easily extractable from a corpus.

We focus on the notion of a lexical class as a means for implicitly learning the properties of a verb. That is, by considering verbs that share argument structure properties as a class, the grouping of verbs into coherent classes corresponds to the induction of argument structure information. We performed computational experiments to verify that such verb classes can be induced from frequencies of parts-of-speech, subcategorisation frames and superficial properties of the noun arguments to a verb.

To approximate the human learning setting where no explicit training is provided, we used hierarchical clustering as an unsupervised method for inducing class membership. We tested numerous combinations of statistical features that have been shown, using supervised methods, to successfully discriminate the verb classes (Merlo and Stevenson 2001). In our unsupervised experiments, the features gave rise to 3 balanced clusters corresponding to our 3 sample verb classes, with an overall best accuracy of 69%, comparable to the performance achieved with supervised training methods.

The success of the clustering experiments suggests that argument structure information associated with verb classes can be learned using only statistics over simple syntactic cues, and thus could in fact contribute to the acquisition of verb semantics in a Syntactic Bootstrapping type of process. Moreover, the best accuracy we achieved uses only two frequencies - part-of-speech (VBN/VBD) and subcategorization bias - that have been shown to influence ambiguity resolution (Trueswell 1996 and Garnsey et al. 1994, respectively). We discuss the implications of our results for the connections between the role of statistics in acquisition and in syntactic disambiguation.


AMLaP Conference, Saarbrücken, September 2001