The British National Corpus 2014
The British National Corpus 2014 is a major project led by Lancaster University to create a 100 million word corpus (a large collection of ‘real life’ language) of modern-day British English. This corpus will be used by researchers to understand more about how language works and how it is evolving. Educators, dictionary compilers and the interested public will also be able to access the corpus to find usage examples of modern British English in different genres.
Currently, the first stage of the project has been completed with the Spoken BNC2014 released via Lancaster’s CQPweb. The second stage involves creating a written counterpart to the Spoken BNC: the Written BNC2014 (see below).
Update: We are working towards a public release of the Written BNC2014 in 2020. Currently, a balanced subset of the corpus, BNC2014 Baby+, is available via #LancsBox.
The Written BNC
The Written BNC2014 will be a new version of the written section of the original British National Corpus, which is now over 20 years old. The corpus will allow for diachronic comparisons with the original BNC (BNC1994), whilst being representative of current British English. We are collecting samples from fiction, academic journals, newspapers, magazines, blogs and more.
The British National Corpus 2014 written (BNC2014 written) is being compiled by a team of researchers at the ESRC Centre for Corpus Approaches to Social Science (CASS), Lancaster University led by Vaclav Brezina and Tony McEnery.
The sampling frame was proposed by Abi Hawtin, based on her doctoral research and the sampling frame of the original British National Corpus (BNC1994). The project has been supported by ESRC grants no. EP/P001559/1, ES/K002155/1 and ES/R008906/1.
Would you like to contribute to the Written BNC2014?
- Book collection [Already submitted] [British authors] [Instructions]
- Student essay collection (school-level or university-level essays)
- Email collection (anonymised) [Instructions]
- SMS collection (anonymised) [WhatsApp Instructions] [FB Instructions]
- Tweet collection (anonymised) [Instructions]
How to get access?
- BNC2014 Spoken
- Data download
Data in XML format can be downloaded from: http://corpora.lancs.ac.uk/bnc2014/
Go to BNClab and start searching the corpus; BNClab offers sociolinguistic analyses on the fly as well as a comparison with BNC1994 – demographic
- Register for free and log on to CQPweb.
- Sign-up for access to the BNC2014 Spoken.
- Select ‘BNC2014’ in the main CQPweb menu.
- BNC1994 (original British National Corpus)