Thu20Jun20132pm - 3pmLancaster University, FASS Meeting Room 1
UCREL Corpus Research Seminar: Presenting the new General Service List: Rationale, method, implications
Presenters: Vaclav Brezina & Dana Gablasova
Learning vocabulary is a complex process in which the learner needs to acquire both the form and a variety of meanings/uses of a given lexical item (Nation, 2001). For the beginner the main question, of course, is where to start. General vocabulary wordlists can assist in this process by providing a list of common vocabulary items. Although there are a number of general vocabulary lists available, the by far most influential and widely-used both in pedagogy and vocabulary research is West's GSL (Carter 2012). However, a number of problems with West's GSL have been pointed out over the years (cf. Gilner 2011).
In response to the problems identified with the GSL, this study offers a bottom-up, quantitative approach to the development of a New General Service List (new-GSL) by means of examining frequent general words across four language corpora (LOB, BNC, BE06 and EnTenTen12) of the total size of almost 13 billion running words. The four corpora were selected to represent a variety of corpus sizes (from one million to over 12 billion tokens) and approaches to representativeness and sampling (from small samples to whole documents). The study brought strong evidence about the stability of the core English vocabulary across a variety of language corpora including different written and spoken contexts. We examined the overlap between 3000 most frequent vocabulary items and identified substantial correspondence between the four corpora in terms of the number of shared items (71%) as well as the distribution of the words in the wordlists (as established by a series of Spearman's correlations). The final product, the new-GSL, consists of a total of 2,496 words. It is divided into the base part (2118 items) and the current vocabulary part (378 items). The new-GSL covers between 80.1 and 81.7 per cent of text in the source corpora, which is comparable to the coverage of West's GSL. In its present form, the new-GSL can be used both for lexical research and development of teaching materials.