Learn about the BNC2014, scan a book sample and contribute to the corpus…

On Saturday 12 May 2018, CASS hosted a small training event at Lancaster University for a group of participants, who came from different universities in the UK.  We talked about the BNC2014 project and discussed both the theoretical underpinnings as well as the practicalities of corpus design and compilation. Slides from the event are available as pdf here.

The participants then tried in practice what is involved in the compilation of a large general corpus such as the BNC2014. They selected and scanned samples of books from current British fiction, poetry and a range of non-fiction books (history, popular science, hobbies etc.). Once processed, these samples will become a part of the written BNC2014.

Here are some pictures from the event:

Carmen Dayrell and Vaclav Brezina before the event

Elena Semino welcoming participants

In the computer lab: Abi Hawtin helping participants

A box full of books

If you are interested in contributing to the written BNC2014, go to the project website  to find out about different ways in which you can participate in this exciting project.

The event was supported by ESRC grant no. EP/P001559/1.

The Spoken BNC2014 is now available!

On behalf of Lancaster University and Cambridge University Press, it gives us great pleasure to announce the public release of the Spoken British National Corpus 2014 (Spoken BNC2014).

The Spoken BNC2014 contains 11.5 million words of transcribed informal British English conversation, recorded by (mainly English) speakers between the years 2012 and 2016. The situational context of the recordings – casual conversation among friends and family members – is designed to make the corpus broadly comparable to the demographically-sampled component of the original spoken British National Corpus.

The Spoken BNC2014 is now accessible online in full, free of charge, for research and teaching purposes. To access the corpus, you should first create a free account on Lancaster University’s CQPweb server (https://cqpweb.lancs.ac.uk/) if you do not already have one. Once registered, please visit the BNC2014 website (http://corpora.lancs.ac.uk/bnc2014) to (a) sign the corpus’ end-user licence and (b) register your CQPweb account – following the instructions on the site. When you return to CQPweb, you will have access to the Spoken BNC2014 via the link that appears in the list of ‘Present-day English’ corpora. While access is initially only via the CQPweb platform, the underlying corpus XML files and associated metadata will be available for download in Autumn 2018.

The BNC2014 website also contains lots of useful information about the corpus, and in particular a downloadable manual and reference guide, which will be available soon. Further information, as well as the first research articles to use Spoken BNC2014 data, will be available in two in-press publications associated with the project: a special issue of the International Journal of Corpus Linguistics (due next month) and an edited collection in the Routledge ‘Advances in Corpus Linguistics’ series (due early 2018).

The BNC2014 does not end here – we are currently working on transcribing materials provided to us by the British Library to provide a substantial supplement to the corpus – find out more about that here: http://cass.lancs.ac.uk/?p=2241. For now, we will be waiting and watching with interest to see what work the corpus releases today stimulates. As ever with corpus data, it does not enable all questions to be answered, but it does allow a very wide range of questions to be investigated.

The Spoken BNC2014 research team would like to express our gratitude to all who have had a hand in the creation of the corpus, and hope that you enjoy exploring the data. We are, of course, keen to hear your feedback about the corpus; this, as well as any questions, can be directed to Robbie Love (r.m.love(Replace this parenthesis with the @ sign)lancaster.ac.uk) or Andrew Hardie (a.hardie(Replace this parenthesis with the @ sign)lancaster.ac.uk).

Spoken BNC2014 Symposium

On the afternoon of Monday 26th June, CASS hosted a special symposium to celebrate the upcoming public launch of the Spoken British National Corpus 2014 – a corpus which members of CASS and Cambridge University Press have spent the last three years compiling.

More than fifty guests attended, representing a mixture of Lancaster Summer Schools participants, members of the CASS Challenge Panel, and those who travelled to Lancaster just for the day.

To kick off the symposium, CASS Centre Director Andrew Hardie said a few words about the history of Corpus Linguistics at Lancaster University, and put the compilation of a new BNC into context against previous developments in the field. He expressed his delight at the interest in the Spoken BNC2014 project as evidenced by the number of guests who were in attendance for the symposium.

I then gave the first talk alongside Claire Dembry (from Cambridge University Press) and Andrew Hardie, as representatives of the Spoken BNC2014 research team which also includes Vaclav Brezina and Tony McEnery. We discussed the main methodological decisions we made when thinking about the design, data collection, transcription and processing of the corpus. Andrew then gave a quick demonstration of the corpus in CQPweb, showing how features including speaker IDs, overlaps and attribution confidence are displayed in the interface.

Following our talk came the first of four research presentations, all of which used (the early access subset of) the Spoken BNC2014. The first of these was a talk by Karin Aijmer (University of Gothenburg) about the intensifier fucking, which went down very well with the audience. Karin’s Spoken BNC2014 research, which also includes other intensifiers, will be published as a chapter in Brezina et al. (forthcoming).

After a short break for refreshments, Jacqueline Laws (University of Reading) presented research into verb-forming suffixation which she had undertaken with Chris Ryder and Sylvia Jaworska. Comparing the demographically-sampled component of the Spoken BNC1994 to the new Spoken BNC2014, she found that females now appear to produce more neologisms (e.g. favouritize, popify) compared to males. Laws et al.’s research will be published in a forthcoming special issue of the International Journal of Corpus Linguistics.

Susan Reichelt (Lancaster University) was next to present her work on producing sociolinguistically comparable subsets of both the original and new Spoken British National Corpora. She highlighted a point which I had touched upon in my earlier talk: that the compilation of the Spoken BNC2014 sought to strike a balance between direct comparability with the original corpus on the one hand, and methodological improvement on the other. The areas where improvement was favoured over comparability (e.g. the classification of speaker socio-economic status) ought to be considered especially when thinking about sociolinguistic analysis. Susan’s work is associated with the recently announced CASS SDA project.

Finally, Jonathan Culpeper and Mathew Gillings (Lancaster University) presented their work on politeness variation between the north and south of England. They aimed to assess the extent to which commonly held stereotypes about differences between northern and southern politeness were reflected in language use in both the original and new corpora as a single dataset. Their work will be published as a chapter in Brezina et al. (forthcoming).

My reaction as the organiser of the symposium was that there is definitely a sense of anticipation about the release of the Spoken BNC2014, which is planned to take place in the autumn. Furthermore it was lovely to meet so many friendly and enthusiastic attendees. I am very grateful to each of the speakers for giving such interesting talks, and to all who attended – especially those who tweeted their reactions to the talks using the #BNC2014 hashtag! As one of my final duties as a member of CASS before moving onto pastures new, I am very glad that the symposium went as well as it did.

Introducing a new project with the British Library

Since 2012 the BBC have been working with the British Library to build a collection of intimate conversations from across the UK in the BBC Listening Project. Through its network of local radio stations, and with the help of a travelling recording booth the BBC has captured many conversations of people, who are well known to one another, on a range of topics in high quality audio.

For the past two years we have been discussing with the BBC and the British Library the possibility of using these recordings as the basis of a large scale extension of our spoken BNC corpus. The Spoken BNC2014 has been built so far to reflect language in intimate settings – with recordings made in the home. This has led to a large and very useful collection of data but, without the resources of an organization such as the BBC, we were not able to roam the country with a sound recording booth to sample language from John o’Groats to Land’s End! By teaming up with the BBC and British Library we can supplement this very useful corpus of data, which is strongly focused on a ‘hard to capture’ context, intimate conversations in the home, with another type of data, intimate conversations in a public situation sampled from across the UK.

Another way in which the Listening data should prove helpful to linguists is that the data itself was captured in a recording studio as high quality audio recordings. Our hope is that a corpus based on this material will be of direct interest and use to phoneticians.

We have recently concluded our discussion with the British Library, which is archiving this material, and signed an agreement which will see CASS undertake orthographic transcription of the data. Our goal is to provide a high quality transcription of the data which will be of use to linguists and members of the public, who may wish to browse the collection, alike. In doing this we will be building on our experience of producing the Trinity Lancaster Corpus of Spoken Learner English and the Spoken BNC2014.

We take our first delivery of recordings at the beginning of March and are very excited at the prospect of lifting the veil a little further on the fascinating topic of everyday conversation and language use. The plan is to transcribe up to 1000 of the recordings archived at the British Library. We will be working to time align the transcriptions with the sound recordings also and are working closely with our strong phonetics team in the Department of Linguistics and English Language at Lancaster University to begin to assess the extent to which this new dataset could facilitate new work, for example, on the accents of the British Isles.

Our partners in the British Library are just as excited as we are – Jonnie Robinson, lead Curator for Spoken English at the British Library says ‘The British Library is delighted to enable Lancaster to make such innovative use of the Listening Project conversations and we look forward to working with them to make the collection more accessible and to enhance its potential to support linguistic and other research enquiries’.

Keep an eye on the CASS website and Twitter feed over the next couple of years for further updates on this new project!

Spoken BNC2014 book announcement

We are excited to announce a forthcoming book which will be published as part of the BNC2014 logoRoutledge Advances in Corpus Linguistics series. “Corpus Approaches to Contemporary British Speech: Sociolinguistic Studies of the Spoken BNC2014” (edited by Vaclav Brezina, Robbie Love and Karin Aijmer) will feature a collection of research which is currently being undertaken by the recipients of the Spoken BNC2014 Early Access data grants.

With exclusive early access to approximately five million words of Spoken BNC2014 data, the book’s contributors will present a range of innovative studies which each analyse the corpus from a sociolinguistic perspective.

Following the public release of the complete Spoken BNC2014 (approximately ten million words) in late 2017, the book is anticipated to follow shortly thereafter. The agreement of the book with Routledge joins a previously announced special issue of the International Journal of Corpus Linguistics (IJCL), which will feature a range of work by other recipients of the Spoken BNC2014 Early Access data grants.


The Spoken BNC2014 early access projects: Part 4

In January, we announced the recipients of the Spoken BNC2014 Early Access Data Grants. Over the next several months, they will use exclusive access to the first five million words of Spoken BNC2014 data to carry out a total of thirteen research projects.

In this series of blogs, we are excited to share more information about these projects, in the words of their authors.

In the fourth and final part of our series, read about the work of Tanja Hessner & Ira Gawlitzek, Karin Axelsson, Andrew Caines et al. and Tanja Säily et al.

Tanja Hessner and Ira Gawlitzek

University of Mannheim, Germany

Women speak in an emotional manner; men show their authority through speech! – A corpus-based study on linguistic differences showing which gender clichés are (still) true by analysing boosters in the Spoken BNC2014

Western world clichés claim that women are emotional and often exaggerate, which is reflected in their speech. In contrast, men’s language is said to be characterised by bluntness. Aiming to shed a bit more light on statements like these, this study is going to consider gender differences on the lexical level.

In order to discover if and, if so, to which extent there really is a difference between female and male speakers, the phenomena of boosters will be investigated in the Spoken BNC2014 early access subset. Boosters such as totally or absolutely are particularly appealing and suitable for analysing gender differences since they are extremely multifaceted and they are indicators not only of lively, but also of emotional and powerful speech. Not only are appropriate boosters investigated by using quantitative methods, but also by analysing the data in a qualitative way.

Karin Axelsson

University of Gothenburg, Sweden

Canonical and non-canonical tag questions in the Spoken BNC2014: What has happened since the original BNC?

What is happening to tag questions in British everyday conversation? Are canonical tag questions, where the form of the tag reflects that of the preceding clause (as in She won’t come, will she?), on the way out as the use of innit and other invariant tags is spreading? Who uses innit in 2014? The use of tag questions in the Spoken BNC2014 early access subset will be compared to the use in the demographic part of the original Spoken BNC reflecting the language of the early 1990s.

Andrew Caines1, Michael McCarthy2 and Paula Buttery1

1University of Cambridge, UK

2University of Nottingham, UK

‘You still talking to me?’ The zero auxiliary progressive in spoken British English, twenty years on

With early access to a subset of the Spoken BNC2014, we will be able to assess whether a supposedly ‘ungrammatical’ construction has become more frequently used in conversational British English over the past 20 years. The construction in question is the ‘zero auxiliary’ – for example, the progressive aspect construction may be used with an -ing verb form alone (“you talking to me?”, “What you doing?”, “We going to town”) whereas the standard rule is to combine an auxiliary verb (BE or HAVE) with the -ing form.

In the original Spoken BNC recorded in the early 1990s, the zero auxiliary occurred in one-in-twenty progressive constructions, a rate that rose to one-in-three if second person interrogatives (You talking to me? etc.) were considered alone. Moreover, younger working-class speakers were more likely to use the zero auxiliary than older middle-class speakers. We will investigate how these usage rates compare to the Spoken BNC2014, in the process updating the demographics of zero auxiliary use as well.

Tanja Säily1, Victoria González-Díaz2 and Jukka Suomela3

1University of Helsinki, Finland

2University of Liverpool, UK

3Aalto University, Finland

Variation in the productivity of adjective comparison

The functional competition between inflectional (‑er) and periphrastic (more) comparative strategies in English has received a great deal of attention in corpus-based research. A key area of competition remains relatively unexplored, however: the productivity of either comparative strategy, or how diversely they are used with different adjectives. The received wisdom is that inflection is fully productive, so we might expect to find no variation within the productivity of ‑er. However, recent research using new methods shows sociolinguistic variation in the productivity of extremely productive derivational suffixes. Whether the same variation applies to the productivity of inflectional processes remains an open question.

On the basis of the Spoken BNC2014 early access subset, our project will analyse intra- and extra-linguistic variation in the productivity of inflectional and periphrastic comparative strategies. Intra-linguistic factors include syntactic position, modification preferences, length and derivational type of the adjective. The extra-linguistic determinants focus on gender, age, socio-economic status, conversational setting and roles of the interlocutors. Our research constitutes a timely contribution to current knowledge of adjective comparison and morphological theory-building. If (a) variation in the productivity of inflectional comparison is found and (b) similar change in the productivity of both derivational and inflectional processes is observed, this will support our hypothesis that there is a derivation-to-inflection cline rather than a sharp divide.

Check back soon for more updates on the Spoken BNC2014 project!

The Spoken BNC2014 early access projects: Part 3

In January, we announced the recipients of the Spoken BNC2014 Early Access Data Grants. Over the next several months, they will use exclusive access to the first five million words of Spoken BNC2014 data to carry out a total of thirteen research projects.

In this series of blogs, we are excited to share more information about these projects, in the words of their authors.

In Part 3 of our series, read about the work of Karin Aijmer, Kazuki Hata et al. and Laura Paterson.

Karin Aijmer

University of Gothenburg, Sweden

Investigating intensifiers in the Spoken BNC2014

Intensifiers undergo rapid changes. Old ones may go out of fashion and be replaced by new ones even in a short diachronic perspective. They should therefore be studied in up-to-date spoken material. This project will describe ‘new’ intensifiers (or new developments of intensifiers) such as so (cool), fucking, damn, dead, enough and the contexts in which they are used. What do they for example collocate with? Who are the typical users?

The aim of the article using data from the Spoken BNC2014 early access subset is to study recent or on-going changes in the area of intensification. Intensifiers are interesting to study because they have a tendency to lose ground and may be replaced by other intensifiers even in a short diachronic perspective. Intensifiers have earlier been studied on the basis of the spoken part of the British National Corpus, and access to the EAS will make it possible to compare the frequencies of intensifiers across time. On the basis of the corpus data it will also be possible to give information about the speakers (e.g. whether they are teenagers or adults, gender and social class of the speakers).

Kazuki Hata, Yun Pan and Steve Walsh

Newcastle University, UK

Talking the talk, walking the walk: interactional competence in and out

Our project aims to characterise interactional competence through a comparison of casual conversation and institutional talk, two distinct genres. The proposed study will build on an ongoing project using the NUCASE corpus (Newcastle University Corpus of Academic Spoken English), led by the School of Education, Communication and Language Sciences, Newcastle University. From our analysis of the NUCASE data, we have identified specific features of interactional competence which operate in different academic contexts. Interactional competence, across a range of academic disciplines, can be characterised by identifying the key linguistic and interactional features, which promote engagement and maximise ‘learning’ and ‘learning opportunities’.

The proposed study would extend findings from the NUCASE study by comparing two corpora, and by highlighting the ways in which interactional competence operates in both formal and informal settings. We see the Spoken BNC2014 early access subset as an ideal source to accomplish our research aim, due to its geographical and functional features, offering a unique opportunity to study speakers’ interactional competence in different settings, with a particular focus on the ‘organising features’ of spoken interactions. We anticipate that the proposed study would bring into question some of the recent claims from functional/interactional linguistic studies, regarding the textual and interpersonal functions of several tokens, and provide a better understanding of the context-shaped/renewing nature of discourse across interactional contexts.

Laura Paterson

Lancaster University, UK

‘You can just give those documents to myself’: Untriggered reflexive pronouns in 21st century spoken British English

Reflexive pronouns (myself, herself, etc.) must share reference with another grammatical unit in order to fulfil their syntactic criteria: in the sentence ‘The cat washes herself’, the noun phrase the cat and the reflexive pronoun herself represent the same entity and share a syntactic bond. However, despite syntactic constraints, reflexive pronouns occur without coreferent NPs in some varieties of English. In ‘You can just give those documents to myself’, the pronoun you and the reflexive pronoun myself cannot be coreferent and have different real-world referents. Reflexives occurring without coreferent noun phrases are classed as ‘untriggered’ and have traditionally been deemed ungrammatical. However, untriggered reflexives can be understood.

Using the Spoken BNC2014 early access subset, I will investigate the use of untriggered reflexives in 21st century spoken British English, asking:

  1. Do untriggered reflexives occur in particular syntactic positions?
  1. Does the use of untriggered reflexives correlate with use of a particular grammatical person?
  1. Does the use of untriggered reflexives correlate with particular demographic groups?
  1. How does the use of untriggered reflexives compare with the use of reflexives in 21st century spoken British English?

Check back soon for Part 4!

The Spoken BNC2014 early access projects: Part 2

In January, we announced the recipients of the Spoken BNC2014 Early Access Data Grants. Over the next several months, they will use exclusive access to the first five million words of Spoken BNC2014 data to carry out a total of thirteen research projects.

In this series of blogs, we are excited to share more information about these projects, in the words of their authors.

In Part 2 of our series, read about the work of Chris Ryder et al., Andreea Calude and Barbara McGillivray et al.

Chris Ryder, Jacqueline Laws and Sylvia Jaworska

University of Reading, UK

From oldies to selfies: A diachronic corpus-based study into changing productivity patterns in British English suffixation

The data from the Spoken BNC2014 early access subset will provide a unique opportunity to examine changes that have occurred in affix use in spoken British English over a twenty-year period; for example, the word selfie has only entered general usage since the invention of the iPhone. Using the recently developed MorphoQuantics database containing complex word data for 222 word-final affixes from the demographically sampled subset of the original Spoken BNC, direct comparisons can be made between old and new datasets, focussing on suffixation patterns, changes in productivity, and trends that demonstrate the shifts in semantic scope of individual suffixes. These features will be analysed chiefly through an examination (both quantitative and qualitative) of neologisms within the data, specifically regarding their regularity of construction, occurrence, and meaning.

This study is just one example of the diachronic morphological analyses that will be made available through a comparison of the Spoken BNC2014 EAS and the Spoken BNC, by utilising the categorisation system provided by MorphoQuantics.

Andreea Calude

University of Waikato, New Zealand

Sociolinguistic variation in cleft constructions: a quantitative corpus study of spontaneous conversation

This project concerns links between the use of various grammatical constructions and sociolinguistic variation, for example is grammar used differently by men and women, or by younger and older speakers? We know that such variation can be observed for certain phonological features (e.g., some vowel sounds) and for certain pragmatic constructions (e.g., discourse markers and new and given information), but as regards grammar features, the answer remains largely unknown or at best vague.

I intend to use the Spoken BNC2014 early access subset to investigate cleft constructions from a sociolinguistic variationist perspective, with the aim of uncovering (potential) systematic syntactic variation across age, gender, dialect, and socio-economic status. Clefts constitute the most frequently used focusing strategy in English, with demonstrative clefts being among the most common in spontaneous conversation, for example: “That is what I want to study”, “This is where I was born”. Despite intense diachronic and synchronic study of the structure and function of clefts in English, virtually nothing is known about the relationship between clefts use and sociolinguistic variation.

The Spoken BNC2014 data will be coded for all demonstrative clefts using a combination of manual and automatic detection, and each construction identified will be attributed to a particular speaker profile (in terms of their sociolinguistic features). Three linguistic features will also be coded for each construction, namely discourse function, reference direction (cataphoric or anaphoric), and information structure (amount of new and given information included).  The data will be analysed using a mixed effects generalised linear regression model.

Barbara McGillivray1, Gard Buen Jenset1 and Michael Rundell2

1University of Oxford, UK

2Lexicography MasterClass, UK

The dative alternation revisited: fresh insights from contemporary spoken data

A well-known feature of English grammar is the dative alternation, whereby a verb may be used in an SVOO construction (Give me the money) or in the pattern SVO followed by a PP with the preposition to (Give the money to me). This is quite a well-researched topic, and generalizations have been made about the factors influencing a writer’s choice of one construction or another, and about which verbs show a preference for one of these patterns over the other. However, most of the studies published to date draw either on introspection or on data from written sources. The availability of contemporary, unscripted spoken data takes us into new territory, and offers an exciting opportunity to revisit this topic.

Our plan is to use the data from the Early Access Scheme to investigate verbs whose argument structure preferences include the dative alternation. Once we have all the relevant corpus data from the Spoken BNC2014 early access subset, we will analyse it using state-of-the-art multivariate statistical techniques, in order to account for the interplay of all the potentially significant variables, whether lexical, semantic, syntactic, or and social. The proposed study thus exploits many of the unique features of this dataset, including the metadata on speakers and the USAS semantic tagging, to answer questions concerning the possible influence of semantic categories, socio-economic factors, gender, dialect, age, as well as linguistic features on a speaker’s preferences. Once the study is complete, there would be opportunities for fresh comparative studies, either with the original Spoken BNC or with contemporary written data.

Check back soon for Part 3!

The Spoken BNC2014 early access projects: Part 1

In January, we announced the recipients of the Spoken BNC2014 Early Access Data Grants. Over the next several months, they will use exclusive access to the first five million words of Spoken BNC2014 data to carry out a total of thirteen research projects.

In this series of blogs, we are excited to share more information about these projects, in the words of their authors.

In Part 1 of our series, read about the work of Deanna Wong, Jonathan Culpeper and Robert Fuchs.

Deanna Wong

Macquarie University, Australia

Investigating British English backchannels in the Spoken BNC2014

Have you ever listened to someone listening? While we might expect that listeners are silent, it turns out that listeners have a lot to say. Mostly, this listener speech happens at the same time as when the speaker is talking, but listeners are not talking to interrupt the speaker. Instead, listeners signal to the speaker that they are paying attention, that they agree with what the speaker has to say, and sometimes, that they are ready to have their turn at talking. The words that listeners use to signal these things can range from a simple mm to whole sentences. To make things even more interesting, how listeners listen varies across different parts of the world.

Sociolinguists use the term ‘backchannels’ to describe listener speech. Early research identified backchannels by careful investigations of individual conversations. That analysis took time, though, and it was not until researchers were able to access language corpora that we started to get a sense of the nature of backchannels in conversation on a larger scale.

However looking for evidence of backchannels in a corpus has its own challenges. If the actual language used by listeners is to be uncovered, we cannot assume that they take a specific form. Otherwise, we might miss something important! The key to unlocking this information is to use corpus annotation. Annotation is simply a way of marking what is happening in the talk. For example, corpus annotation can be used to indicate who is speaking, and if they are speaking at the same time so that their speech overlaps.

In my investigation into the Spoken BNC2014 early access subset, I will be using annotation that marks overlapping speech to help identify potential backchannels in conversations from across the United Kingdom. The size of the corpus, and its accompanying information about its speakers will add to our understanding of how British speakers backchannel. It will also help us to compare their backchannels to those produced by speakers of English in other parts of the world.

Jonathan Culpeper

Lancaster University, UK

Politeness variation in England

The stereotype of British politeness is pervasive, and, moreover, it is usually linked to what people say. Take, as an example, this advice on British stereotypes for study abroad students:

The way that British people speak and the language that we use is also considered quite polite. The language that many people use, including lots of phrases like ‘please’, ‘thank you’, ‘pardon’ or ‘excuse me’ and ‘would you mind…’ certainly back this up […]


In fact, the first item in the list, please, seems to be elevated by many English parents to the supernatural – the “magic word” for achieving successful requests. Similarly, in the earthly world of academia, a large number of studies have found evidence that present-day English politeness is often characterised by so-called “off-record” or “negative politeness” –  it’s all about being indirect, showing respect for others’ privacy, freedom from disturbance, and so on. In the example, the expressions ‘would you mind’, ‘pardon me’ and ‘excuse me’ all readily fit this function. To these, one might add could you […], seemingly, the most frequent way in which requests are performed in British English.

But is all this true? For starters, there’s a lingering concern that some British people may actually use other items, perhaps functional alternatives, just as or even more frequently. For instance, thank you is one expression, but what about ta or cheers? More substantially, for anybody living in the north of England, the idea of British indirectness does not entirely ring true. Indirectness is somewhat stand-offish and cold; not reflective of the much proclaimed northern warmth and friendliness. Consider this opinion (written by a Scotsman who has lived in various parts of Britain):

There is definitely a North/South divide when it comes to politeness. Having lived on the South coast of England and then Scotland, it is very noticeable that people are more friendly and polite the further North you go in Britain


Politeness here is connected to friendliness. Maybe academia is orienting to a particular and different cultural stereotype of politeness, one based on a British southern perspective. Or maybe the idea of northern friendly politeness is a stereotype itself and has no basis in what people actually do.

This study sets out to examine these issues. I intend it to be a contribution to one of the newest sub-fields of linguistic pragmatics, variational pragmatics, which combines pragmatics and dialectology. One of the greatest impediments to doing such a study has been the lack of large quantities of spoken, especially conversational, data taken from across Britain. The Spoken British National Corpus 2014 early access subset offers a solution.

Robert Fuchs

University of Münster, Germany

Recent change in the sociolinguistics of intensifiers in British English

As social beings and speakers of a language we are extremely good at putting people into boxes – female and male, young and old, old-fashioned and hip. One of the many clues that allows us to make these (sometimes in fact unwarranted) assumptions is sociolinguistic variation. Whether and how women and men differ in how they speak, for example, is a hotly debated topic in- and outside of academia.

This study approaches this topic from two novel angles. The first is that several sociolinguistic factors, age, social class, gender of speaker, gender of interlocutor and others, are considered in interaction with each other. Secondly, the study also looks at change across time, from the 1990s to the 2010s. For example, given the change in attitudes concerning what roles women and men are supposed to fulfil in society, I expect that any gender differences present in the 1990s will have decreased by the 2010s. The variable that the study investigates is the usage of so-called intensifiers (as in *very* good, *so* cool), which are said to occur more frequently in female than male speech.

Check back soon for Part 2!

What’s wrong with “a bunch of migrants”? Looking at the linguistic evidence

This week at Prime Minister’s Questions, David Cameron used the term “a bunch of migrants to describe refugees at a camp in Calais. He was subsequently criticised by Labour MPs and members of the general public on Twitter, and the story was reported on in mainstream newspapers like the Guardian and the Telegraph. Critics described his comments as “dehumanising”, “callous” and “inflammatory”.

Something about David Cameron saying the words “bunch of” to describe a group of people caused a furore – but what was it? Is this how people normally use this phrase, or is this a noteworthy departure from the norm?

Here at CASS we have the unique opportunity to analyse a very large set of everyday conversations between speakers of British English from all over the UK, which participants have been recording in their homes and sending to us to be transcribed. Using the transcriptions, we can use computer software to analyse how words and phrases are used commonly across the entire country.

I searched through 4.5 million words of present day conversation to find out how people in the UK normally use the phrase “bunch of”. I found that “people”, “flowers” and “things” are the most likely words to be described in this way. Beyond this, there are several other words which refer to groups of people:

“kids”, “volunteers”, “retards”, “losers”, “lads”, “individuals”, “friends”, “dickheads”, “dancers”, “Aussies”, “alcoholics”, “thieving sods” and “thieving fuckers”.

Absent from this list is the word “migrants”, which does not occur in this context. The evidence suggests that people do often use “bunch of” to describe groups of people negatively or with distaste. Therefore the upset caused by Cameron’s use of the phrase “a bunch of migrants” is perhaps understandable.

We are still collecting recordings from speakers all over the UK. For information on how to contribute to this project, which is led by Lancaster University and Cambridge University Press, please visit the Spoken BNC2014 website.