The British National Corpus 2014

The British National Corpus 2014 (BNC2014) is a major project led by Lancaster University.  We created a 100-million-word corpus (a large collection of ‘real life’ language) of present-day British English. This corpus can be used by researchers to understand more about how language works and how it is evolving. Educators, dictionary compilers and the interested public will also be able to access the corpus to find usage examples of modern British English in different genres.

The whole  corpus is now available for research (non-commercial) purposes.

The project has been supported by ESRC grants no. EP/P001559/1, ES/K002155/1 and ES/R008906/1.


How to get access?

BNC2014 Written

The corpus is freely available (together with the spoken part) via #LancsBox X. All major research functionalities are available via this tool.

We are also looking into the possibility of releasing the BNC2014 Written via other popular platforms; due to copyright reasons the full texts of the written corpus cannot be released at this stage.

BNC2014 Spoken

  • Data download

Data in XML format can be downloaded from:

  • BNClab

Go to BNClab and start searching the corpus; BNClab offers sociolinguistic analyses on the fly as well as a comparison with BNC1994 – demographic

  • CQPweb
  1. Register for free and log on to CQPweb.
  2. Sign-up for access to the BNC2014 Spoken.
  3. Select ‘BNC2014’ in the main CQPweb menu.
  • BNC1994 (original British National Corpus)
  1. Register for free and log on to CQPweb.
  2. Select ‘British National Corpus (XML edition)’ in the main CQPweb menu.
  • BNC2014 Baby + (5M words)

A balanced subset of the corpus, BNC2014 Baby+, is available via #LancsBox v. 6.