Spoken BNC2014

Compiling a new, publicly accessible corpus of British English conversation

The Spoken BNC2014 is now accessible online in full, free of charge, for research and teaching purposes. To access the corpus, you should first create a free account on Lancaster University’s CQPweb server (https://cqpweb.lancs.ac.uk/) if you do not already have one. Once registered, please visit the BNC2014 website (http://corpora.lancs.ac.uk/bnc2014) to (a) sign the corpus’ end-user licence and (b) register your CQPweb account – following the instructions on the site. When you return to CQPweb, you will have access to the Spoken BNC2014 via the link that appears in the list of ‘Present-day English’ corpora. While access is initially only via the CQPweb platform, the underlying corpus XML files and associated metadata will be available for download in Autumn 2018. The BNC2014 website also contains lots of useful information about the corpus, and in particular a downloadable manual and reference guide.

This project is a collaboration between CASS and Cambridge University Press. Together, we have collected samples of real-life, informal, spoken interactions between speakers of British English from across the United Kingdom. The transcriptions of these recordings form a corpus known as the Spoken British National Corpus 2014 (Spoken BNC2014), which will be made available publicly in the autumn of 2017 on CQPweb’s Lancaster server.

The audio recordings contain face-to-face conversations between people who speak British English as their first language, collected between 2012 and 2016. The recordings could be on any subject, and speakers were aware of being recorded as they conversed. In total the corpus comprises over 10 million words.

We used the year 2014 in the name of the corpus for three reasons: it commemorates the 20th anniversary of release of the original British National Corpus (1994); it is the year in which CASS and CUP launched the project; and, perhaps most importantly, it is the median year of the data, which was collected between 2012 and 2016.

In 2015, we announced that some of the data would be released early to selected researchers who could apply with a research proposal. Since then, a dozen fascinating research projects have been conducted and we look forward to publishing them; some comprise a forthcoming special issue of the International Journal of Corpus Linguistics (edited by Tony McEnery, Robbie Love & Vaclav Brezina) in 2017, and others are being published in a Routledge book (edited by Vaclav Brezina, Robbie Love & Karin Aijmer) in 2018.

On Monday 26th June 2017, we will host a half-day symposium at Lancaster University to celebrate the upcoming release of the Spoken BNC2014.

Looking ahead

Earlier in 2017 we announced plans to build a large scale extension to the Spoken BNC2014, using audio recordings from the BBC Listening Project, which are archived by the British Library. Working with the BBC and the British Library, we will undertake transcription of a large number of recordings from hard to reach areas of the UK. Once completed, the transcripts will be made available as a supplement to the Spoken BNC2014.

Related publications

McEnery, T. and Love, R. (fc). Bad Language. In Culpeper, J., F. Katamba, P. Kerswill, R. Wodak and T. McEnery (eds.). (fc). English Language: Description, Variation and Context (2nd ed.). London: Palgrave.

Brezina, V., R. Love and K. Aijmer (eds.). (2018 fc). Corpus Approaches to Contemporary British Speech: Sociolinguistic studies of the Spoken BNC2014. New York: Routledge.

Love, R., Dembry, C., Hardie, A., Brezina, V. and McEnery, T. (2017 fc). The Spoken BNC2014: designing and building a spoken corpus of everyday conversations. In International Journal of Corpus Linguistics, 22:3.

McEnery, T., Love, R. and Brezina, V. (eds.). (2017 fc). International Journal of Corpus Linguistics, 22:3, Special Issue.

Related conference papers & public talks

Love, R. (2017 fc). Bad language revisited: swearing in the Spoken BNC2014. Corpus Linguistics 2017 Conference. University of Birmingham, UK. July 2017.

Love, R. and Hardie, A. (2017 fc). Introducing the Spoken BNC2014 – explore the data yourself. Pre-conference workshop. Corpus Linguistics 2017 Conference. University of Birmingham, UK. July 2017.

Love, R. and Dembry, C. (2017 fc). Introducing the Spoken BNC2014. Spoken BNC2014 symposium. Lancaster University, UK. June 2017.

Love, R. (2017). FUCK in spoken British English revisited with the Spoken BNC2014. ICAME 38 Conference. Charles University, Czech Republic. May 2017.

Love, R. (2016). “Accent – General American; Dialect – British English”: reflections on tricky metadata in the Spoken BNC2014. American Association for Corpus Linguistics (AACL) 2016 Conference. Iowa State University, Ames, Iowa, USA. September 2016.

Love, R. (2016). Sociolinguistics for spoken corpora: swearing in the Spoken BNC2014. Sociolinguistics Summer School 7. Université de Lyon, France. June 2016.

Love, R. (2016). “Normal with a brummy twang”: dealing with metadata in the Spoken BNC2014. IVACS 2016 Conference. Bath Spa University, UK. June 2016.

Love, R. (2015). Spoken English in UK society. ESRC Language Matters: Communication, Culture, and Society. International Anthony Burgess Foundation, Manchester, UK. November 2015.

Love, R. and Dembry, C. (2015). Who says what in spoken corpora?: speaker identification in the Spoken BNC2014. Corpus Linguistics 2015 Conference. Lancaster University, UK. July 2015.

Dembry, C. and Love, R. (2015). Collecting the new Spoken BNC2014 – overview of methodology. Corpus Linguistics 2015 Conference. Lancaster University, UK. July 2015.

Love, R. (2015). Critical issues in spoken corpus development: defining a transcription schema for the spoken BNC2014. ICAME 36 Conference. University of Trier, Germany. May 2015.

McEnery, T., Love, R. and Dembry, C. (2014). Words ‘yesterday and today’. ESRC Language Matters: Communication, Culture, and Society. Royal United Services Institute, London, UK. November 2014.

Dembry, C. and Love, R. (2014). Spoken English in Today’s Britain. Cambridge Festival of Ideas. Cambridge University, UK. October 2014.


Team:

Co-Investigator: Tony McEnery

Co-Investigator: Claire Dembry (Cambridge University Press)

Co-Investigator: Andrew Hardie

Senior Research Associate: Vaclav Brezina

Research Student: Robbie Love


Read the latest updates on this project:

  • Words, words, words: A new Frequency Dictionary of British English (6 December 2023)

    If you want to know how frequently words are used in different contexts across speech and writing and with what other words these are associated, you might be interested in a new dictionary, which has just come out. This dictionary is based on the British National Corpus 2014, a large balanced dataset developed at Lancaster …

  • Celebrating the Written BNC2014: Lancaster Castle event (25 November 2021)

    On 19 November 2021, The ESRC Centre for Corpus Approaches to Social Science (CASS) organised an event to celebrate the launch of the Written British National Corpus 2014 (BNC2024). The event was live-streamed from a very special location: the medieval Lancaster Castle.  There were about 20 participants on the site and more than 1,200 …

  • Introductory Blog – Hanna Schmueck (18 November 2020)

    I am very honoured to have received the Geoffrey Leech Outstanding MA Student Award for my MA in Language and Linguistics. This award traditionally goes to the MA student with the highest overall average. I started my postgraduate journey in September 2019 after finishing my undergraduate degree at the University of Bamberg (Germany) in 2018 …

  • CASS in the City (12 March 2019)

    CASS in the City: Introducing BNClab to the general public Last Saturday (9th March) a group of students led by Vaclav Brezina and Dana Gablasova took part in the Campus in the City event, organised by Lancaster University. The main aim of this event is show research highlights to a general audience. So, we decided to …

  • ‘Using corpora to teach sociolinguistics’ at the TaLC conference in Cambridge (22 July 2018)

    Last week, the Faculty of Education, The University of Cambridge hosted 13th Teaching and Language Corpora Conference. This wonderful event brought together researchers and practitioners interested in different applications of corpus techniques in the classroom. Dana Gablasova and I with the help of Irene Marin Cervantes and Tanjun Liu gave a practical workshop introducing the idea …

  • My experience with working at CASS as a SPRINT intern (17 July 2018)

    Over the last few weeks I have been working at the ESRC Centre for Corpus Approaches, Lancaster University (CASS) as part of the SPRINT 2018 internship programme. I have just finished my second year studying Spanish and Linguistics and this project was particularly interesting to me from a linguistic perspective. I wanted to work with …

  • Is Academic Writing Becoming More Colloquial? (10 July 2018)

    Have you noticed that academic writing in books and journals seems less formal than it used to? Preliminary data from the Written BNC2014 shows that you may be right! Some early data from the academic journals and academic books sections of the new corpus has been analysed to find out whether academic writing has become more …

  • British National Corpus 2014: A sociolinguistic book is out (1 June 2018)

    Have you ever wondered what real spoken English looks like? Have you ever asked the question of whether people from different backgrounds (based on gender, age, social class etc.) use language differently? Have you ever  thought it would be interesting to investigate how much English has changed over the last twenty years? All these questions …

  • Learn about the BNC2014, scan a book sample and contribute to the corpus… (14 May 2018)

    On Saturday 12 May 2018, CASS hosted a small training event at Lancaster University for a group of participants, who came from different universities in the UK.  We talked about the BNC2014 project and discussed both the theoretical underpinnings as well as the practicalities of corpus design and compilation. Slides from the event are available …

  • The Spoken BNC2014 is now available! (25 September 2017)

    On behalf of Lancaster University and Cambridge University Press, it gives us great pleasure to announce the public release of the Spoken British National Corpus 2014 (Spoken BNC2014). The Spoken BNC2014 contains 11.5 million words of transcribed informal British English conversation, recorded by (mainly English) speakers between the years 2012 and 2016. The situational context of …