The Role of Frequency in ELT: New Corpus Evidence Brings a Re-appraisal Geoffrey Leech, Lancaster University
1.Why is frequency important?
My subject in this paper is the role of frequency in helping to determine teaching priorities in English language teaching. On the one hand, it ems to be a matter of common n to teach words or forms which are frequent before tho which are infrequent or rare. On the other hand, I feel that over the past generation the topic of frequency has been neglected in the teaching of languages, although it has started to reclaim attention in the last few years. There are also problems, both of theory and practice, relating to frequency.
First, what is the point of frequency? Why is it valuable, in particular, for the language teacher? I claim that it is valuable to build frequency considerations into one's curriculum, one's syllabus, one's teaching materials, and one's classroom teaching. If an item naturally occurs frequently in the language being taught, it is likely to be important also for the target behaviour of the learner: the learner will later often come across that item in reading and listening, and will often need to u it in communicating with others. And yet, frequency has been largely ignored, for three reasons.
The first reason is that until recently, knowledge of the frequency of items in a language has been very limited. To consider why, we need to ask: How do we find out about frequency? Information about frequencies of words, expressions, and grammatical structures can be gained from a large sample of texts, i.e. a corpus, of the language concerned, and of cour the computer is indispensable to this work, which may involve sifting through tens or hundreds of millions of words. Such corpora of language data having been increasingly compiled over the past 30 years, but are only now becoming riously applied to pedagogical purpos. But the breakthrough is being made, particularly in dictionaries. The major English-language dictionaries for advanced learners, such as the Oxford Advanced Learners' Dictionary, the Collins Cobuild Dictionary, and especially the Longman Dictionary of Contemporary English (LDOCE), now take account of frequency information about items of vocabulary. For example, the ns of words are placed in order of frequency, and the American English edition of LDOCE (Longman Advanced American Dictionary, 2000) provides little ‘frequency boxes’ alongside important words, giving their frequency rating in spoken and in written English.
return (verb) return (noun)
S W S W
1 1
2
3
Figure 1文科恋曲
As an example, the boxes in Figure 1 inform us that return and a verb and return as a noun are both very frequent in written English (‘1’ means that they are in the top one thousand words), but are not quite so frequent in speech (‘2’ = in the top two thousand words, and ‘3’ = in the top three thousand words). The same dictionary provides occasional bar charts, contrasting (for example) the different frequencies in American English and British English of the near-synonyms rubbish, garbage and trash. This kind of information is now making an impact in lexicography becau publishers have invested a great deal of time, effort and money in building and using such large electronic text corpora of both spoken and written language. So uful knowledge about frequency is now at last becoming available. To give some recently available frequency data on general English, I will make reference in this paper to two books:
Biber, D., Johansson, S., Leech, G., Conrad, S. and Finegan, E., Longman Grammar of Spoken and Written English. London: Longman 1999. (henceforth LGSWE)
Leech, G., Rayson, P. and Wilson, A. (2001), Word Frequencies in Written and Spoken English, bad on the British National Corpus. London: Longman 2001. (henceforth WFWSE)
(The former of the books gives information on grammatical frequency, and the latter gives information on word or lexical frequency.)
The cond reason for the neglect of frequency is that specialists in applied linguistics have not given much attention to it since the 1950s. Fifty years ago, frequency was quite a popular topic with leaders of opinion in ELT. People like Michael West, who compiled the General Service List of English Words (Longman, 1953), spent years, with teams of helpers, counting the frequency of words in many texts. That was before the age of computers: so, the work of obtaining frequency information by hand was extremely time-consuming and boring, and moreover, since there were no tape recorders in tho days, it was restricted to written language. So this work was of limited application, and applied linguists have since then given more attention to more interesting topics, like how do people learn languages. The focus turned to the process and techniques of learning and t
想创业什么项目好eaching, rather than cour content. It is now instructive to look at the most influential textbooks on applied linguistics over the past 30 years, such as Rod Ellis's The Study of Second Language Acquisition (1994), and to notice how little attention is given to frequency, and how little enthusiasm is shown for it. Ellis wrote:
Overall, there is little evidence to support the claim that input frequency affects L2 acquisition, but there is also little evidence to refute it. Perhaps the safest conclusion is that input frequency rves as one of
条件刺激the factors influencing development, often combining with other factors such as L1 transfer and
communicative need. (ibid. 272-3)
This is one of the very few passages in that long and highly informative book where Ellis discuss frequency. But looking cloly, we e that Ellis is discussing input frequency - the frequency with which learners are expod to language items in the classroom - rather than frequency in the language
in general u. He is attending to frequency as an input to learning, whereas I want to focus on frequ
ency as a factor steering the outcome, assuming that the ultimate goal of learning is to obtain a communicative competence in the language.
I found one other general textbook which gives more attention to this subject: van Els et al Applied Linguistics and the Learning and Teaching of Foreign Languages[[. In discussing the lection and gradation of cour content, the authors mention frequency in language u as the first consideration, in determining what should be taught and when, for example in lecting vocabulary. But in addition they mentioned other criteria, such as:
1.Range or dispersion
2.Coverage
3.Learnability
4.Communicative need
2. Difficulties and competing factors
Here we come to the third reason for the neglect of frequency: it is actually not such a straightforwar
d idea, becau there are difficulties in applying it, both in principle and in practice. This will emerge during the discussion of the above four criteria. In what follows, I will discuss the criteria concentrating initially on vocabulary lection, as the easiest ca, and will later give more attention to frequency of grammatical phenomena.
a. Range or dispersion (from now on I will u the single term ‘dispersion’) means how well the item is distributed throughout the u of the language, for example in different texts and text types. To study this objectively, we have to return to the idea of a sample corpus of texts – and bear in mind that a
‘texts’ in this n include both written texts and transcriptions of speech. Thus in the British National Corpus (BNC), one of the major corpora that can be ud for frequency studies on English and the corpus on which WFWSE is bad, the noun influence occurs with the same frequency as the noun software, but software has a lower dispersion, i.e. is less well distributed throughout the corpus. So by that criterion, software is a less uful word for learners in general, although it may be particularly uful for learners of English for computing and technology.1 (For tho interested to know how the distributional spread of a word in a corpus is measured, the easiest measure to u is range, which simply means that the corpus is randomly divided into (say) 100 equal parts, to find out
祛风止痛胶囊how many of them contain the word in question. Dispersion is a more nsitive measure, bad on a statistical formula known as Juilland’s D.2) Hence frequency and dispersion can be judiciously combined to give
1 This example incidentally provides a warning that corpora may go out of date rather quickly. The BNC was compiled in the early 1990s. It is quite possible that this result would not be found in a corpus collected today, when software has become more of an everyday word.
2 The formula is given in WFWSE, p.18. See also Lyne (1985).
a measure of what vocabulary is more central to the language (‘core vocabulary’) and what vocabulary is more peripheral or specialized.
b. Coverage : This is another measure of what might be called 'coreness' of vocabulary: words with wide coverage are more uful to the learner than words with narrow coverage. We can distinguish two types of coverage: coverage of meaning and coverage of register or style. Coverage of meaning can be illustrated by the two verbs give and donate. Give is a word of wider mantic coverage than its partial synonym donate, and we can check on this by looking up the words in the dictionary, and noting how many different ns give has, compared with donate. Coverage of register or style
overlaps with dispersion, and refers to the extent to which a word is likely to occur in different varieties of the language. For example, the adjective nice is over 8 times as frequent in speech as in writing. This measure suggests that the word nice, although extremely uful in speech, is far less uful in writing - a factor we might want to take into account in designing core vocabularies for teaching purpos. The opposite is true of thus, which is more than 20 times more common in writing than in speech. We can contrast the words, which we can consider colloquial and formal words respectively, with a word like came, which is approximately equally common in both speech and writing, and in that n has a more balanced stylistic coverage than nice and thus.
In addition to the more or less objective factors, van Els et al mention psychological and didactic criteria, especially the criterion of learnability, which we now briefly consider.
c. Learnability.No doubt some words are more 'learnable' than others, i.e. for one reason or another students will find them easier to learn. One reason may be that the word has irregular forms: e.g. the noun corpus I have ud here often occurs with the rare Latin plural corpora, which makes is more difficult to learn in this respect than most English nouns. Other factors of difficulty for the learner include cognitive complexity, which can be more easily illustrated in the grammatical sphere. For example, psycholinguistic studies have shown that passive constructions are more difficult to proces橱柜用什么材料好
s than active constructions, and that negative constructions are more difficult to process than positive ones (e Clark and Clark 1977: 105, 240-1; also Wason 1962). This is not surprising, and no teacher would dream of teaching passive ntences before active ones, or negative ntences before positive ones.
d. Communicative need. For many teachers, this will be considered the overriding criterion of lection, although it is somewhat difficult to determin
e. Whereas in the earlier stages of learning, communicative need will be governed by the developing requirements of the curriculum, in a longer perspective it will be determined by the general goals of language learning for speaking and listening, writing and reading, with needs analysis yielding different priorities for different categories of students, such as tho learning English for Academic Purpos or English for Specific Purpos.
From this list of factors - frequency, range, coverage and learnability - it appears that high frequency is just one of the variables that lead to the prioritization of an item in the language learning process. But an important thing to notice is that all of the other factors are strongly associated or correlated with frequency. Consider dispersion: my work with WFWSE has shown me that it is in fact quite diffi
cult to find items in a frequency list where greater frequency is not significantly associated with a greater dispersion. But there are counterexamples. One counterexample I found is the pair of nouns answer and animal:
index
小孩满月Frequency per million Dispersion
0.93
answer (n.)
124
153
0.90
animal (n.)
怀孕几天能测出来
The explanation of this ca ems to be that answer is a more generally employed abstract noun,
whereas animal, as a concrete noun, although more common, is topic-related, and therefore more unevenly distributed. In general, nouns are more topic-related than other parts of speech, and accordingly have a lower dispersion than their frequency might lead one to expect.
Next, consider coverage: here is a small list of verbs of more general coverage (in register and/or meaning) matched with partial synonyms of more restricted coverage. It is obvious that the general-coverage verbs are very much more frequent than the more restricted (and more formal) verbs.
give 1284 per million donate 10 per million
want 945 per million desire 14 per million
我们曾经相爱过
build 230 per million erect 15 per million
hide 64 per million conceal 17 per million
As for learnability, if we associate one kind of learning difficulty with morphological complexity of words, there is a well-known law, Zipf's law or principle of least effort (Zipf 1935, 1949) which states among other things that the more complex a word, the less frequent it will be. This intuitively obvious point is confirmed by the following BNC data on complexity (in number of syllables) and frequency fr
om WFWSE:
Most common 1-syllable word: the (61847 per million)
Most common 2-syllable word: into (1634 per million)
Most common 3-syllable word: government (622 per million)
Most common 4-syllable word: information (386 per million)
Most common 5-syllable word: international (221 per million)
Most common 6-syllable word: responsibility (93 per million)
It is clear that in this purely formal n frequency and learnability correlate. On the level of syntax, consider again passives. Passive verb phras are far less frequent than active ones: the highest percentage of passives is found in academic writing, where they amount to over 20% of all verbs. The