Imagine you have just started learning a new foreign language. Which words do you need to learn first? We all might have some intuitions about this. If the language is English then time – the most frequent noun both in speech and writing – will probably be more useful than say the adjective temporaneous (yes, OED records this word). However, intuitions (as corpus linguists know) are not to be trusted (at least not all the time). Only through analysis of large amounts of textual data (yes, language corpora!) will we be able to identify words that occur frequently across a number of different contexts.
The research Dana and I are going to talk about on Thursday will look at the methodology of creating a pedagogical wordlist – the new-GSL (the old one is now really out of date)- which can assist both learners and teachers in the process of acquisition of basic English vocabulary. We’ll be looking at the ways in which both large (BNC, EnTenTen12) and small corpora (LOB, BE06) can be used in the creation of such a wordlist.