The so-called Poverty of Stimulus problem [1], together with Gold's proof that natural languages can not be learned without negative evidence [4] have traditionally been used in linguistic theories to claim that language acquisition requires pre-existent innate knowledge about grammar. More recent work has come to show some flaws in this line of reasoning. On the one hand, several developmental studies on language acquisition report that the language that children initially produce reflects closely the language to which they have been exposed [7]. On the other hand, computational studies have shown that Gold's result does not apply to human languages, because it does not consider the great amount of distributional information available in natural language. [2,3,6] show that, due to the presence of this distributional information, Simple Recurrent Networks can acquire artificial grammars containing many of the properties that according to Gold's argument would be unlearnable. More recently, Simple Recurrent Networks have been shown to acquire certain grammatical relations directly from real child-directed speech corpora [5].
We present two computational experiments in which Simple Recurrent Networks succeed in centain aspects learning real-life grammars of English and Dutch from large text corpora. In our experiments, we extended a variation of the architecture described in [6] to deal with the huge vocabulary sizes of large text corpora. We substituted the original localistic, fully orthogonal, representation for a quasi-orthogonal one. This allows us to deal with large vocabularies using small networks, while still being very close to orthogonality. Moreover, we argue that this small deviation from orthogonality improves the networks' learning performances, acting as random noise. We test the representations obtained in the hidden layers for a range of grammatical issues in both languages, including long distance relations, grammatical category distinctions and subtle inflectional details. Our experiments suggest that grammar can be learned without the need for any innate knowledge. One network was trained on a corpus of approximately 600,000 tokens of literary English; the other was trained on a corpus consisting of 4,500,000 tokens from Dutch newspaper articles. The networks succeed in learning a great amount of grammatical issues from these realistic corpora without any artificial manipulations on the original statistical distributions. Our results also corroborate those of [6], in that there is not a need for memory-span or language complexity limitations in order for the networks to learn.
References
[1] Chomsky, N. (1968). Language and Mind. Brace & World, New York
[2] Elman, J.L. (1990). Finding structure in time. Cognitive Science, 14:179-211
[3] Elman, J.L. (1993). Learning and development in neural networks: The importance of starting small. Cognition, 48:71-99
[4] Gold, E.M. (1967). Language identification in the limit. Information and Control, 10:447-474
[5] Lewis, J.D. and Elman, J.L. (2001) A connectionist investigation of linguistic arguments from the Poverty of Stimulus: Learning the unlearnable. Proceedings of 23rd annual conference of the Cognitive Science Society
[6] Rohde, D.L.T. and Plaut, D.C. (1999) Language acquisition in the absence of explicit negative evidence: How important is starting small? Cognition, 72:67-109
[7] Tomasello, M. (2000). Do young children have adult syntactic competence? Cognition, 74:209-253