Spoken BNC2014

Compiling a new, publicly accessible corpus of British English conversation

This project is a collaboration between CASS and Cambridge University Press. Together, we have collected samples of real-life, informal, spoken interactions between speakers of British English from across the United Kingdom. The transcriptions of these recordings form a corpus known as the Spoken British National Corpus 2014 (Spoken BNC2014), which will be made available publicly in the autumn of 2017 on CQPweb’s Lancaster server.

The audio recordings contain face-to-face conversations between people who speak British English as their first language, collected between 2012 and 2016. The recordings could be on any subject, and speakers were aware of being recorded as they conversed. In total the corpus comprises over 10 million words.

We used the year 2014 in the name of the corpus for three reasons: it commemorates the 20th anniversary of release of the original British National Corpus (1994); it is the year in which CASS and CUP launched the project; and, perhaps most importantly, it is the median year of the data, which was collected between 2012 and 2016.

In 2015, we announced that some of the data would be released early to selected researchers who could apply with a research proposal. Since then, a dozen fascinating research projects have been conducted and we look forward to publishing them; some comprise a forthcoming special issue of the International Journal of Corpus Linguistics (edited by Tony McEnery, Robbie Love & Vaclav Brezina) in 2017, and others are being published in a Routledge book (edited by Vaclav Brezina, Robbie Love & Karin Aijmer) in 2018.

On Monday 26th June 2017, we will host a half-day symposium at Lancaster University to celebrate the upcoming release of the Spoken BNC2014.

Looking ahead

Earlier in 2017 we announced plans to build a large scale extension to the Spoken BNC2014, using audio recordings from the BBC Listening Project, which are archived by the British Library. Working with the BBC and the British Library, we will undertake transcription of a large number of recordings from hard to reach areas of the UK. Once completed, the transcripts will be made available as a supplement to the Spoken BNC2014.

Related publications

McEnery, T. and Love, R. (fc). Bad Language. In Culpeper, J., F. Katamba, P. Kerswill, R. Wodak and T. McEnery (eds.). (fc). English Language: Description, Variation and Context (2nd ed.). London: Palgrave.

Brezina, V., R. Love and K. Aijmer (eds.). (2018 fc). Corpus Approaches to Contemporary British Speech: Sociolinguistic studies of the Spoken BNC2014. New York: Routledge.

Love, R., Dembry, C., Hardie, A., Brezina, V. and McEnery, T. (2017 fc). The Spoken BNC2014: designing and building a spoken corpus of everyday conversations. In International Journal of Corpus Linguistics, 22:3.

McEnery, T., Love, R. and Brezina, V. (eds.). (2017 fc). International Journal of Corpus Linguistics, 22:3, Special Issue.

Related conference papers & public talks

Love, R. (2017 fc). Bad language revisited: swearing in the Spoken BNC2014. Corpus Linguistics 2017 Conference. University of Birmingham, UK. July 2017.

Love, R. and Hardie, A. (2017 fc). Introducing the Spoken BNC2014 – explore the data yourself. Pre-conference workshop. Corpus Linguistics 2017 Conference. University of Birmingham, UK. July 2017.

Love, R. and Dembry, C. (2017 fc). Introducing the Spoken BNC2014. Spoken BNC2014 symposium. Lancaster University, UK. June 2017.

Love, R. (2017). FUCK in spoken British English revisited with the Spoken BNC2014. ICAME 38 Conference. Charles University, Czech Republic. May 2017.

Love, R. (2016). “Accent – General American; Dialect – British English”: reflections on tricky metadata in the Spoken BNC2014. American Association for Corpus Linguistics (AACL) 2016 Conference. Iowa State University, Ames, Iowa, USA. September 2016.

Love, R. (2016). Sociolinguistics for spoken corpora: swearing in the Spoken BNC2014. Sociolinguistics Summer School 7. Université de Lyon, France. June 2016.

Love, R. (2016). “Normal with a brummy twang”: dealing with metadata in the Spoken BNC2014. IVACS 2016 Conference. Bath Spa University, UK. June 2016.

Love, R. (2015). Spoken English in UK society. ESRC Language Matters: Communication, Culture, and Society. International Anthony Burgess Foundation, Manchester, UK. November 2015.

Love, R. and Dembry, C. (2015). Who says what in spoken corpora?: speaker identification in the Spoken BNC2014. Corpus Linguistics 2015 Conference. Lancaster University, UK. July 2015.

Dembry, C. and Love, R. (2015). Collecting the new Spoken BNC2014 – overview of methodology. Corpus Linguistics 2015 Conference. Lancaster University, UK. July 2015.

Love, R. (2015). Critical issues in spoken corpus development: defining a transcription schema for the spoken BNC2014. ICAME 36 Conference. University of Trier, Germany. May 2015.

McEnery, T., Love, R. and Dembry, C. (2014). Words ‘yesterday and today’. ESRC Language Matters: Communication, Culture, and Society. Royal United Services Institute, London, UK. November 2014.

Dembry, C. and Love, R. (2014). Spoken English in Today’s Britain. Cambridge Festival of Ideas. Cambridge University, UK. October 2014.


Team:

Co-Investigator: Tony McEnery

Co-Investigator: Claire Dembry (Cambridge University Press)

Co-Investigator: Andrew Hardie

Senior Research Associate: Vaclav Brezina

Research Student: Robbie Love


Read the latest updates on this project:

  • Introducing a new project with the British Library (21 February 2017)

    Since 2012 the BBC have been working with the British Library to build a collection of intimate conversations from across the UK in the BBC Listening Project. Through its network of local radio stations, and with the help of a travelling recording booth the BBC has captured many conversations of people, who are well known ...

  • Spoken BNC2014 book announcement (5 August 2016)

    We are excited to announce a forthcoming book which will be published as part of the Routledge Advances in Corpus Linguistics series. “Corpus Approaches to Contemporary British Speech: Sociolinguistic Studies of the Spoken BNC2014” (edited by Vaclav Brezina, Robbie Love and Karin Aijmer) will feature a collection of research which is currently being undertaken by ...

  • The Spoken BNC2014 early access projects: Part 4 (16 March 2016)

    In January, we announced the recipients of the Spoken BNC2014 Early Access Data Grants. Over the next several months, they will use exclusive access to the first five million words of Spoken BNC2014 data to carry out a total of thirteen research projects. In this series of blogs, we are excited to share more information about ...

  • The Spoken BNC2014 early access projects: Part 3 (7 March 2016)

    In January, we announced the recipients of the Spoken BNC2014 Early Access Data Grants. Over the next several months, they will use exclusive access to the first five million words of Spoken BNC2014 data to carry out a total of thirteen research projects. In this series of blogs, we are excited to share more information about ...

  • The Spoken BNC2014 early access projects: Part 2 (4 March 2016)

    In January, we announced the recipients of the Spoken BNC2014 Early Access Data Grants. Over the next several months, they will use exclusive access to the first five million words of Spoken BNC2014 data to carry out a total of thirteen research projects. In this series of blogs, we are excited to share more information about ...

  • The Spoken BNC2014 early access projects: Part 1 (1 March 2016)

    In January, we announced the recipients of the Spoken BNC2014 Early Access Data Grants. Over the next several months, they will use exclusive access to the first five million words of Spoken BNC2014 data to carry out a total of thirteen research projects. In this series of blogs, we are excited to share more information about ...

  • What’s wrong with “a bunch of migrants”? Looking at the linguistic evidence (28 January 2016)

    This week at Prime Minister’s Questions, David Cameron used the term “a bunch of migrants” to describe refugees at a camp in Calais. He was subsequently criticised by Labour MPs and members of the general public on Twitter, and the story was reported on in mainstream newspapers like the Guardian and the Telegraph. Critics described ...

  • Spoken BNC2014 Early Access Data Grant Scheme – winning proposals (13 January 2016)

    Lancaster University’s ESRC funded Centre for Corpus Approaches to Social Science (CASS) and Cambridge University Press are pleased to announce the recipients of the Spoken BNC2014 Early Access Data Grants. These successful applicants will receive exclusive early access to approximately five million words of the Spoken BNC2014 via CQPweb. They will be the first to ...

  • Spoken BNC2014 Early Access Data Grant Scheme – Applications now open (5 November 2015)

    Lancaster University’s ESRC funded Centre for Corpus Approaches to Social Science (CASS) and Cambridge University Press are excited to announce the Spoken British National Corpus 2014 Early Access Data Grant scheme. Applications are now open for researchers at any level in the field of corpus linguistics and beyond to gain early access to a large subset ...

  • The Spoken British National Corpus 2014 – project update (11 August 2015)

    It has been little over a year since CASS and Cambridge University Press announced a collaboration to compile a successor to the spoken component of the British National Corpus, the Spoken BNC2014. This will be the largest corpus of spoken British English since the original, with the advantage of being collected in the 2010s rather ...