Date of this Version
Published in Transactions of the Nebraska Academy of Sciences, Volume 2 (1973).
The purpose of this paper is to briefly examine two proposed extensions of statistical/probabilistic methodology, long familiar to the sciences, to linguistics. On the one hand it will be argued that the invocation of probabilistic measures is indispensable to any sensible criteria of grammatical adequacy, and on the other hand it will be suggested that probabilistic automata can be relevant to studies of language behavior.
1. The fully adequate (categorial/generative) grammar is one with which there corresponds an algorithm by means of which we can (recognize/ generate) all and only those syntactically correct sequences in the corresponding language. At this writing, there does not exist any such 'ideal' grammar for any natural language; and as long as this situation remains, it will be necessary for the linguist to 'rank' competing grammars for both reasons of suitability for corpora, and assessment in terms of potential adequacy. Because of the prima facie potential of the transformational grammars introduced since the mid-1950's, linguists have not made any rigorous attempt at providing a measure of descriptive adequacy of grammars. Lately, such intuitive criteria as simplicity, intuitivity, economy, etc. have been levied against competing grammars, in adjudication of adequacy. But these are certainly not the kinds of objective criteria necessary to any independently valuable method of resolving disputes over relative adequacy. This is not to say that these quasi-criteria are without import to the linguist. Surely, in a ceteris paribus situation it is reasonable to prefer the simpler model to the more complex. But up to now there is no method of 'ranking' available by which we can determine when a ceteris paribus situation obtains. In linguistics, just as in the sciences, only when the adequacies of competing models are established are issues of simplicity, economy and the like, germane.
Certainly, the application of statistical/probabilistic procedures to the field of linguistics is not new. Precedents have been established in taxonomic studies, analyses of distributions of word types in corpora (viz. Zipf's Law), etc. But the notion of using an interjacent probabilistic grammar in determining descriptive adequacy is quite innovative. Of the recent developments in this area, perhaps the most notable is that of Suppes (1970). Suppes' motivation for this paper was the disregard of conventional grammatical models to such fundamental and universal characteristics of natural languages as relatively short utterance length, predominance of grammatically simple utterances, etc. It seems irrational to Suppes to be tolerant of grammars which pay an inordinate amount of attention to those syntactic structures which are 'deviant,' or at least atypical of general usage, and whose relative frequency of occurrence in the corpus is low. To put the matter differently, if any putatively adequate grammar is to be of value, it must be able to account for a sizeable portion of the corpus, thereby identifying those grammatical types which demand further scrutiny. In order to establish the relative values for alternative grammars, Suppes suggests we consult a probabilistic grammar.