The British National Corpus 2014

The British National Corpus 2014 (BNC2014) is a major project led by Lancaster University.  We created a 100-million-word corpus (a large collection of ‘real life’ language) of present-day British English. This corpus can be used by researchers to understand more about how language works and how it is evolving. Educators, dictionary compilers and the interested public will also be able to access the corpus to find usage examples of modern British English in different genres.

The Spoken part of the corpus (10 million words) has already been released. We will be officially releasing the written part of the corpus (90 million words) on 19th November via #LancsBox X, a software package developed at Lancaster University.  We are also planning a later release of the corpus via other platforms to give the users flexibility to select tools which best suit their research needs.

The project has been supported by ESRC grants no. EP/P001559/1, ES/K002155/1 and ES/R008906/1.


How to get access?

  • BNC2014 Spoken
  • Data download

Data in XML format can be downloaded from:

  • BNClab

Go to BNClab and start searching the corpus; BNClab offers sociolinguistic analyses on the fly as well as a comparison with BNC1994 – demographic

  • CQPweb
  1. Register for free and log on to CQPweb.
  2. Sign-up for access to the BNC2014 Spoken.
  3. Select ‘BNC2014’ in the main CQPweb menu.
  • BNC1994 (original British National Corpus)
  1. Register for free and log on to CQPweb.
  2. Select ‘British National Corpus (XML edition)’ in the main CQPweb menu.
  • BNC2014 Baby + (5M words)

A balanced subset of the corpus, BNC2014 Baby+, is available via #LancsBox v. 6.