阅读与单词Vocabulary Building

更新时间:2023-05-11 09:06:39 阅读: 评论:0

Vocabulary Building in the Perus Digital Library Jeffrey A.Rydberg-Cox Anne Mahoney
February6,2002
1Introduction
Vocabulary acquisition is a particularly vexed question for intermediate students of Greek and Latin.Cognitive studies of language acquisition suggest that a word must be encountered between6and20times before a student can be said to know that word(Nation[1990],p.43-45;Parry[1997];Coady[1997];Hulstijn[1997]).At the same time,encountering the words on general vocabulary lists orflash cards of common words intended for rote memorization does not help;retention rates from lists that are not connected to reading are extremely low. Students must encounter new words in context in order to retain their meanings(Cf.Perry [1998]p.111-112).Unfortunately,the usual distribution of words in texts makes it difficult if not impossible for students to learn vocabulary simply by reading.Vocabulary is distributed in literary texts according to a phenomenon known as Zipf’s law.This law states that the most common words in any text will be“function words,”like the definite article,pronouns, conjunctions,or common prepositions,while most words in a text will appear very few times and the vast majority will appear only once(Zipf[1935],Baayen[2001]).In the cour of normal rea
ding,therefore,students will not encounter a word in context often enough to add it to their active vocabulary.One solution to this problem is linking reading assignments to vocabulary lists so that students can efficiently study the words that they are encountering in their readings.In this paper,we will describe a new computational tool in the Perus Digital Library designed to help students learn vocabulary by generating Latin and Greek word lists that are tailored to reading assignments.
Of cour experienced teachers know that it takes more than just vocabulary to read a lan-
guage,and especially to read literary language,a point made strongly by Kitchell[2000].In addition,familiar words can be ud in unexpected ways,and obscure words can be crucial for the meaning of a passage(as argued by Bull[1948-1949]).Nonetheless,vocabulary is one part of language learning,and is all we focus on here.
2The Perus Vocabulary Tool and How It Works
The Perus Digital Library(www.perus.tufts.edu)now includes a vocabulary tool that can generate veral different kinds of vocabulary lists for Greek or Latin texts.Whenever a text is displayed in the Perus Digital Library,the library also offers a link to a basic vocabulary list of the words that appear in that text.Urs can also create a list for more than one work using a parate in
terface that displays all the Greek or Latin works in a Perus collection.You can also create vocabulary lists for smaller ctions of texts that can be logically divided into smaller units such as the books of Virgil’s Aeneid or Herodotus’History. Thus,a Greek survey cour could create a list containing the vocabulary for Book1of the Iliad,Demosthenes’Against Neaera,and Aeschylus’Agamemnon.Likewi,an Advanced Placement Latin cour might construct a vocabulary list for Cicero’s Pro Caelio and lected poems of Catullus,or for the relevant ctions of the Aeneid(See the sample infigure1).
The custom vocabulary list interface allows readers to change the way that the list appears with veral different sort,filtering,and output options.First,it is possible to sort the list either alphabetically or by word frequency.Sorting in alphabetical order produces a traditional word list,convenient for looking up words while reading the text.Sorting by frequency puts the most common words at the top of the list,making it easy for students to e the most basic words they need for a text.
Counting word frequencies for Greek and Latin texts is more complicated than it might em. The current version of the Perus morphological analyzer,described in Crane[1991],makes no attempt to disambiguate forms that can be derived from more than one lexicon entry,as,for example,the English word“flies”might come from the verb“tofly”or the noun“afly”but the word“flew”is unambiguousl
y a form of“tofly.”Word forms that are ambiguous are included in the maximum count for each dictionary word they might belong to,while unambiguous forms are included in both the minimum and maximum counts.We also calculate a weighted
frequency that attempts to show whether the actual frequency count for a word would be clor to the minimum or maximum frequency score.Note that a form with a minimum weight of zero means that every instance of a word in a text is ambiguous and that the word may not actually appear in the text at all.For example,an English text about airplanes may contain forms of the verb“tofly”but no mention of the inct called“fly,”yet“afly”may appear in the vocabulary list becau the form“flies”might have come from that word.
The tool also allows two different mechanisms for viewing the list:you can choo a table that will provide attractive output in a web browr,or a comma-delimited list that you can import into other software programs such as a spreadsheet or databa.Finally,the vocabulary tool allows you to lect the percentage of the words in a document that you want to include in your list.As with the sort orders,the different percentages are uful for different purpos.Since the vast majority of words in any text appear only once,vocabulary lists showing all words in a document can be quite long.A complete list is precily what is wanted for comprehensive review or a“mini-lexicon”for a lection o
f works.If,on the other hand,you want to give students a list that contains the esntial vocabulary for the lected texts,you can include only the words that account for a higher percentage of the words in the text.
Consider Ovid’s Metamorphos.The text is over78,000words long,but Ovid us only8,789 different words.Of tho,3,644—almost half—appear only once.The most frequent words in the Metamorphos are et,sum,in,and qui,appearing more than1,000times each.Half of the 78,000words in the poem are forms of only321different words.In other words,a student who knows tho321words will know,on average,half the words on a page of the Metamorphos. Three quarters of the total are forms of1,200different words.To get to90%,you need3,000 words.For95%,4,575words suffice.The95-percent level is significant becau a student who knows95%of the words in a text can usuallyfigure out most of the rest from context(Nation and Coady[1988],Laufer[1997]).Hence,although Ovid’s vocabulary is large,there is no need to learn every single one of tho8,789words before starting to read.
For each word in the list,the vocabulary tool also calculates what we call a“key term score.”(It’s calculated using a standard metric from information retrieval and computational linguis-tics,known as tf×id f.This calculation is described in Salton and Buckley[1988],Salton [1989],Singhal et al.[1996].)T
he key term score provides a guide to words that appear rel-atively frequently in the works on the vocabulary list but relatively infrequently in the rest
Figure1:Sample Vocabulary List for Cicero’s Pro Caelio
of the collection in the Perus Digital Library.Words with a high key term score provide an initial guide to important people,places,and concepts in your lection of texts.Frequently appearing words that provide less guidance about the contents of your lection will have a low key term score,and the least important words will have a score of zero.Very common words like sum or ille in Latin,eimi or outos in Greek,will always have a key term score of zero.Proper names,on the other hand,often have relatively high key term scores becau they are often the most distinctive words in a text.Another way to look at it is that words with a non-zero key term score are the most uful words to learn before starting to read this text:they are the ones that an intermediate-level student might not already know,but that are frequent enough in this text to be worth learning.Although only thefirstfive or ten“key terms”tell you about the content of the text,the complete key term list gives you an overview of the likely new vocabulary.
For example,let’s look at the words with a high key word scores for two documents,Lysias’On the Murder of Eratosthenes and Book21of the Odysy.The top ten key words for On the Murder of Eratosthenes include the name Eratosthenes and words for adultery,a rvant woman,a child,a door,and veral words for entering a hou.Likewi,the top key words for Book21of the Odysy i
nclude Antinous,Odysus,Telemachus,and nouns and verbs associated with stretching and stringing a bow.The key words for the two document do not,of cour,capture all of the nuances of the actions being described,but they do provide a uful overview of elements and that are potentially important and unfamiliar vocabulary as you read the texts.
Finally,the vocabulary lists also include short definitions that have been automatically ex-tracted from the Intermediate Liddell and Scott Greek Lexicon and Lewis’Elementary Latin Dictionary(Rydberg-Cox[Forthcoming2001]).Becau this definition is the one listedfirst in the dictionary entry for each word,the definition provided for words with multiple ns may not be entirely correct for the works that you have lected,and words that are not in the medium-sized dictionaries won’t have definitions at all.The vocabulary tool has two different facilities to address the shortcomings of the automatically-extracted definitions.The words in the HTML vocabulary list are linked to the Perus Word Study Tool(described further in Mahoney[2001]),from which you can look up the full definition in either the intermediate or unabridged dictionaries in the Perus Digital Library.The vocabulary tool also provides the

本文发布于:2023-05-11 09:06:39,感谢您对本站的认可!

本文链接:https://www.wtabcd.cn/fanwen/fan/89/882576.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:
相关文章
留言与评论(共有 0 条评论)
   
验证码:
推荐文章
排行榜
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图