What words are most useful for learners of English? Introducing the New General Service List.Learning vocabulary is a complex process in which the learner needs to acquire both the form and a variety of meanings of a given vocabulary item. General vocabulary lists can assist in the process of learning words by providing common vocabulary items. In response to problems identified in the currently available General Service List, the authors decided to investigate the core English vocabulary with very large language corpora using current corpus linguistics technology.
New resources are being added regularly to the new CASS: Briefingstab above, so check back soon.
The new-GSL is an English vocabulary baseline intended for both researchers and practitioners. It is based on robust comparison of four corpora of general English of the total size of over 12 billion words. It contains 2,494 vocabulary items, 2,116 of which belong to a stable lexical core; 378 words in the wordlist represent lexical innovations. All of these words appear with high frequencies across a large number of different contexts.
The article, which describes the methodology of the wordlist compilation, as well as the full new-GSL are available from the Applied Linguistics website in the open access mode.
At the moment, we are working on an American supplement to the new-GSL. Our findings show that there is a surprisingly large overlap between frequent lexical items in British and American corpora. With some modifications, the new-GSL can therefore be successfully used also in the American English contexts.
A larger question, however, that the new-GSL raises is – how do we reconcile our intuitions about important vocabulary items with the corpus-based findings? In this respect, the new-GSL is not a prescriptive but a descriptive wordlist. As we stress in the article, “[w]ith respect to the diversity of ESL/EFL contexts, it is deemed more useful to envision the use of our wordlist as a vocabulary base with the possibility of further additions, rather than a wordlist that strives to cater to a mixed cluster of heterogeneous expectations and needs” (p. 19).
Imagine you have just started learning a new foreign language. Which words do you need to learn first? We all might have some intuitions about this. If the language is English then time – the most frequent noun both in speech and writing – will probably be more useful than say the adjective temporaneous (yes, OED records this word). However, intuitions (as corpus linguists know) are not to be trusted (at least not all the time). Only through analysis of large amounts of textual data (yes, language corpora!) will we be able to identify words that occur frequently across a number of different contexts.
The research Dana and I are going to talk about on Thursday will look at the methodology of creating a pedagogical wordlist – the new-GSL (the old one is now really out of date)- which can assist both learners and teachers in the process of acquisition of basic English vocabulary. We’ll be looking at the ways in which both large (BNC, EnTenTen12) and small corpora (LOB, BE06) can be used in the creation of such a wordlist.