|
Review
of R
and
p
values
Review HW
HW for Thursday
STRUCTURED
CORPORA
Examples
of structured corpora (made to be used as corpora, easily
searched, tagged)
British
National Corpus
10
million spoken, 90 million written
Different
registers (novels, newspapers, tabloids, conversations,
lectures)
1980s
to 1990s
Tagged
for part of speech
Limitations:
little spoken, only British, only modern, only native
speakers, only adults, no phonetics
Corpus
of Contemporary American English
385
Million words and growing
1990-2008
Corpus
del Espanol
Time
Magazine
LDS
General Conference Talks
What
could the following corpora be used for? What are their
limitations?
Daily
Universe online
Printed
book
Letters
of Abraham Lincoln
General
Conference
The
Web (via Google)
General
methodology
Get
out what is put in: textual
Get
out what is put in: interface
Different
corpora for different purposes (International Corpus of Learner
English, CHILDES, Corpora by country)
Possible
uses of corpora
Linguistic
variation (words, phrases, syntax in different registers)
Historical
change (words entering/leaving languages, differences
between centuries)
Stylistic
variation (e.g. NY Times vs Washington Times, partitive used by
Pres. Hinckley, different authors)
Frequency
information (top x words for frequency dictionary, etc)
Reaction
times depend on frequency and need to be controlled for
What
vocabulary do you include in a TESOL book? (Davies and Face
study)
What
church vocabulary do foreign missionaries need? (use General
Conference corpus)
Creating
your own corpus
Web-based
materials can sometimes be done quickly
Often,
though, quite time-consuming (especially spoken)
Copyright
issues
Corpora
of English
Brown
Corpus / LOB (1960s) - 1 million words (glistening, knob)
International
Corpus of English (ICE) - 1980s to present, 1 million words,
50% spoken
What
if you found 4 cases of knob in
Canada and 2 in New Zealand?
British
National Corpus (BNC) / Cobuild (1980s-90s) - hundreds of
millions of words
Specialized,
like the International Corpus of Learner English (ICLE)
What
could this corpus be used for?
Types
of procedures
Wordlists
Concordances
Collocations
Keyword
lists
Next time --
the Wild World Wide Web (via Google)
|