Review of R and p values

Review HW

HW for Thursday


STRUCTURED CORPORA

  1. Examples of structured corpora (made to be used as corpora, easily searched, tagged)

    1. British National Corpus

      1. 10 million spoken, 90 million written

      2. Different registers (novels, newspapers, tabloids, conversations, lectures)

      3. 1980s to 1990s

      4. Tagged for part of speech

      5. Limitations: little spoken, only British, only modern, only native speakers, only adults, no phonetics

    2. Corpus of Contemporary American English

      1. 385 Million words and growing

      2. 1990-2008

    3. Corpus del Espanol

    4. Time Magazine

    5. LDS General Conference Talks

  2. What could the following corpora be used for? What are their limitations?

    1. Daily Universe online

    2. Printed book

    3. Letters of Abraham Lincoln

    4. General Conference

    5. The Web (via Google)

  3. General methodology

    1. Get out what is put in: textual

    2. Get out what is put in: interface

    3. Different corpora for different purposes (International Corpus of Learner English, CHILDES, Corpora by country)

  4. Possible uses of corpora

    1. Linguistic variation (words, phrases, syntax in different registers)

    2. Historical change (words entering/leaving languages, differences between centuries)

    3. Stylistic variation (e.g. NY Times vs Washington Times, partitive used by Pres. Hinckley, different authors)

    4. Frequency information (top x words for frequency dictionary, etc)

      1. Reaction times depend on frequency and need to be controlled for

      2. What vocabulary do you include in a TESOL book? (Davies and Face study)

      3. What church vocabulary do foreign missionaries need? (use General Conference corpus)

  5. Creating your own corpus

    1. Web-based materials can sometimes be done quickly

    2. Often, though, quite time-consuming (especially spoken)

    3. Copyright issues

  6. Corpora of English

    1. Brown Corpus / LOB (1960s) - 1 million words (glistening, knob)

    2. International Corpus of English (ICE) - 1980s to present, 1 million words, 50% spoken

      1. What if you found 4 cases of knob in Canada and 2 in New Zealand?

    3. British National Corpus (BNC) / Cobuild (1980s-90s) - hundreds of millions of words

    4. Specialized, like the International Corpus of Learner English (ICLE)

      1. What could this corpus be used for?

  7. Types of procedures

    1. Wordlists

    2. Concordances

    3. Collocations

    4. Keyword lists

  8. Next time -- the Wild World Wide Web (via Google)