What’s wrong with “a bunch of migrants”? Looking at the linguistic evidence

This week at Prime Minister’s Questions, David Cameron used the term “a bunch of migrants to describe refugees at a camp in Calais. He was subsequently criticised by Labour MPs and members of the general public on Twitter, and the story was reported on in mainstream newspapers like the Guardian and the Telegraph. Critics described his comments as “dehumanising”, “callous” and “inflammatory”.

Something about David Cameron saying the words “bunch of” to describe a group of people caused a furore – but what was it? Is this how people normally use this phrase, or is this a noteworthy departure from the norm?

Here at CASS we have the unique opportunity to analyse a very large set of everyday conversations between speakers of British English from all over the UK, which participants have been recording in their homes and sending to us to be transcribed. Using the transcriptions, we can use computer software to analyse how words and phrases are used commonly across the entire country.

I searched through 4.5 million words of present day conversation to find out how people in the UK normally use the phrase “bunch of”. I found that “people”, “flowers” and “things” are the most likely words to be described in this way. Beyond this, there are several other words which refer to groups of people:

“kids”, “volunteers”, “retards”, “losers”, “lads”, “individuals”, “friends”, “dickheads”, “dancers”, “Aussies”, “alcoholics”, “thieving sods” and “thieving fuckers”.

Absent from this list is the word “migrants”, which does not occur in this context. The evidence suggests that people do often use “bunch of” to describe groups of people negatively or with distaste. Therefore the upset caused by Cameron’s use of the phrase “a bunch of migrants” is perhaps understandable.

We are still collecting recordings from speakers all over the UK. For information on how to contribute to this project, which is led by Lancaster University and Cambridge University Press, please visit the Spoken BNC2014 website.

Spoken BNC2014 Early Access Data Grant Scheme – winning proposals

Lancaster University’s ESRC funded Centre for Corpus Approaches to Social Science (CASS) and Cambridge University Press are pleased to announce the recipients of the Spoken BNC2014 Early Access Data Grants. These successful applicants will receive exclusive early access to approximately five million words of the Spoken BNC2014 via CQPweb. They will be the first to conduct research using the data and produce papers to be published in 2017, coinciding with the release of the full corpus.

The successful applicants, their institutions, and the research they intend to undertake, are:

 

Karin Aijmer

Gothenburg

Investigating intensifiers in the Spoken BNC2014

 

Karin Axelsson

Gothenburg

Canonical and non-canonical tag questions in the Spoken BNC2014: What has happened since the original BNC?

 

Andrew Caines1, Michael McCarthy2 and Paula Buttery1

1Cambridge, 2Nottingham

‘You still talking to me?’ The zero auxiliary progressive in spoken British English, twenty years on

 

Andreea Simona Calude

Waikato

Sociolinguistic Variation in Cleft Constructions – a quantitative corpus study of spontaneous conversation

 

Jonathan Culpeper

Lancaster

Politeness variation in England

 

Robert Fuchs

Münster

Recent Change in the sociolinguistics of intensifiers in British English

 

Kazuki Hata, Yun Pan and Steve Walsh

Newcastle

Talking the talk, walking the walk: interactional competence in and out

 

Tanja Hessner and Ira Gawlitzek

Mannheim

Women speak in an emotional manner; men show their authority through speech! – A corpus-based study on linguistic differences showing which gender clichés are (still) true by analysing boosters in the Spoken BNC2014

 

Barbara McGillivray1, Jenset Gard1 and Michael Rundell2

1Oxford, 2Lexicography MasterClass

The dative alternation revisited: fresh insights from contemporary spoken data

 

Laura Paterson

Lancaster

‘You can just give those documents to myself’:  Untriggered reflexive pronouns in 21st century spoken British English

 

Chris Ryder, Jacqueline Laws and Sylvia Jaworska

Reading

From oldies to selfies: A diachronic corpus-based study into changing productivity patterns in British English suffixation

 

Tanja Säily1, Victoria González-Díaz2 and Jukka Suomela3

1Helsinki, 2Liverpool, 3Aalto

Variation in the productivity of adjective comparison

 

Deanna Wong

Macquarie

Investigating British English backchannels in the Spoken BNC2014

 

Thank you to everyone who applied, and congratulations to the winning proposals. Check back soon for more details on the Early Access Data Grant Scheme research.

 

Spoken BNC2014 Early Access Data Grant Scheme – Applications now open

Lancaster University’s ESRC funded Centre for Corpus Approaches to Social Science (CASS) and Cambridge University Press are excited to announce the Spoken British National Corpus 2014 Early Access Data Grant scheme.

Applications are now open for researchers at any level in the field of corpus linguistics and beyond to gain early access to a large subset of the Spoken BNC2014, which is currently being compiled and is due for release in late 2017. Successful applicants will write a paper based on their proposed research for exclusive publication (subject to peer review) in either a special issue of the International Journal of Corpus Linguistics or an edited collection.

We invite proposals for interesting and innovative research that would use approximately five million words of the upcoming Spoken BNC2014 as its primary source of data.

Successful applicants will gain access to the data via the CQPweb platform (cqpweb.lancs.ac.uk). Standard CQPweb functionality will be provided, including annotation (POS tagging, lemmatisation, semantic tagging) and with one new feature: the ability to search the corpus according to categories of speaker metadata such as gender, age, dialect and socio-economic status.

Proposals can approach the data from any theoretical angle, provided corpus methodologies are used and the research can be carried out within the affordances of CQPweb. Successful applicants will receive access to the data in February 2016 with a deadline for full paper submission in October 2016. Subject to peer review, papers will be published in one of the two Spoken BNC2014 launch publications in 2017 (a special issue of the International Journal of Corpus Linguistics has been agreed and a thematic edited collection is being planned).

This is a fantastic opportunity to work with the first very large, general corpus of informal British English conversation created since the original BNC more than twenty years ago. Successful applicants will get access to a large subset of the Spoken BNC2014 eighteen months before the full corpus is released, and will be the very first scholars to undertake and publish research based on this new dataset.

More details about the terms of the data grant scheme can be found in the application form. To apply, download and complete the application form and email it to Robbie Love (r.m.love(Replace this parenthesis with the @ sign)lancaster.ac.uk). The deadline for applications is Friday 11th December 2015.

The Spoken British National Corpus 2014 – project update

SpokenBNCupdateIt has been little over a year since CASS and Cambridge University Press announced a collaboration to compile a successor to the spoken component of the British National Corpus, the Spoken BNC2014. This will be the largest corpus of spoken British English since the original, with the advantage of being collected in the 2010s rather than the 1990s, providing an updated snapshot of spoken language in the UK. By including a set of recordings already gathered by Cambridge University Press before our collaboration began, we plan for the corpus to contain data ranging from the years 2012-2016. As well as being the year in which the project was announced, 2014 will be the median year of the planned data range, and so we chose it to feature in the working title of the project: the Spoken BNC2014.

Since our announcement, we have been hard at work: advertising the project nationally, collecting recordings from speakers from all over the UK, transcribing the data, conducting methodological investigations, and presenting our work so far at corpus linguistics conferences. At ICAME 36 in May we described the development of the Spoken BNC2014 transcription scheme, and at Corpus Linguistics 2015 in July we gave an overview of the data collection methodology as well as presenting new research on speaker identification in transcription. All of this activity continues as we work towards making the corpus freely and publicly available in the year 2017.

So far, we have gathered nearly 700 recordings at an estimated total of approximately six million words of informal conversational data. The majority of recordings feature two or three speakers, with about a quarter of recordings containing four or more so far. So far, the balance of speaker gender is fairly even, and we have been able to gather data from a wide range of ages – though at the moment the 19-29 year olds have a clear lead! We have done very well in England to gather recordings from a great range of self-reported dialects, and we plan now to focus more heavily on gathering recordings from Wales, Scotland, and Northern Ireland. The word cloud of self-reported conversation topics gives a first look at the range of things that users can expect to find being discussed in the corpus.

We are very pleased with the progress of the project so far, and we look forward to releasing the corpus texts publicly once they are complete. In the meantime, as announced at CL2015, we will be offering the opportunity to apply for pre-release data grants later this year. More information about the data grants will be announced in the near future.

The Spoken BNC2014 project features in the Daily Mail

BNC2014 logoThe recently announced collaboration between Cambridge University Press and CASS, the Spoken BNC2014 project, has made headlines in the Daily Mail.

The article, entitled, “No longer marvellous – now we’re all awesome: Britons are using more American words because traditional English is in decline”, describes the preliminary findings of the project, which is in its early stages.

To participate in the project, native British English speakers from all over the UK can record their conversations and send them to us as MP3 files. For each hour of good quality recordings we receive, along with all associated consent forms and information sheets completed correctly, we will pay £18. Each recording does not have to be 1 hour in length; participants may submit two 30 minute recordings, or three 20 minute recordings, but for each hour in total, they will receive £18.

To register your interest in participating, please email corpus(Replace this parenthesis with the @ sign)cambridge.org

Spoken BNC2014 project announcement

BNC2014 logo

We are excited to announce that the ESRC-funded Centre for Corpus Approaches to Social Science (CASS) at Lancaster University and Cambridge University Press have agreed to collaborate on the compilation of a new, publicly accessible corpus of spoken British English called the ‘Spoken British National Corpus 2014’ (the Spoken BNC2014).

The aim of the Spoken BNC2014 project, which will be led jointly by Lancaster University’s Professor Tony McEnery and Cambridge University Press’ Dr Claire Dembry, is to compile a very large collection of recordings of real-life, informal, spoken interactions between people whose first language is British English. These will then be transcribed and made available publicly for a wide range of research purposes.

We aim to encourage people from all over the UK to record their interactions and send them to us as MP3 files. For each hour of good quality recordings we receive, along with all associated consent forms and information sheets completed correctly, we will pay £18. Each recording does not have to be 1 hour in length; participants may submit two 30 minute recordings, or three 20 minute recordings, but for each hour in total, they will receive £18.

The collaboration between CASS at Lancaster University and Cambridge University Press brings together the best resources available for this task. Cambridge University Press is greatly experienced at collecting very large English corpora, and it already has the infrastructure in place to undertake such a large compilation project. CASS at Lancaster University has the linguistic research expertise necessary to ensure that the spoken BNC2014 will be as useful, and accessible as possible for a wide range of purposes. The academic community will benefit from access to a new large spoken British English corpus that is balanced according to a selection of useful demographic criteria, including gender, age, and socio-economic status. This opens the door for all kinds of research projects including the comparison of the spoken BNC2014 with older spoken corpora.

CASS at Lancaster University and Cambridge University Press are very excited to launch the Spoken BNC2014 project, and we look forward to sharing the corpus as widely as possible once it is complete.

To contribute to the Spoken BNC2014 project as a participant please email corpus(Replace this parenthesis with the @ sign)cambridge.org for more information.