Spoken BNC2014 book announcement

We are excited to announce a forthcoming book which will be published as part of the BNC2014 logoRoutledge Advances in Corpus Linguistics series. “Corpus Approaches to Contemporary British Speech: Sociolinguistic Studies of the Spoken BNC2014” (edited by Vaclav Brezina, Robbie Love and Karin Aijmer) will feature a collection of research which is currently being undertaken by the recipients of the Spoken BNC2014 Early Access data grants.

With exclusive early access to approximately five million words of Spoken BNC2014 data, the book’s contributors will present a range of innovative studies which each analyse the corpus from a sociolinguistic perspective.

Following the public release of the complete Spoken BNC2014 (approximately ten million words) in late 2017, the book is anticipated to follow shortly thereafter. The agreement of the book with Routledge joins a previously announced special issue of the International Journal of Corpus Linguistics (IJCL), which will feature a range of work by other recipients of the Spoken BNC2014 Early Access data grants.

 

The Spoken BNC2014 early access projects: Part 3

In January, we announced the recipients of the Spoken BNC2014 Early Access Data Grants. Over the next several months, they will use exclusive access to the first five million words of Spoken BNC2014 data to carry out a total of thirteen research projects.

In this series of blogs, we are excited to share more information about these projects, in the words of their authors.

In Part 3 of our series, read about the work of Karin Aijmer, Kazuki Hata et al. and Laura Paterson.


Karin Aijmer

University of Gothenburg, Sweden

Investigating intensifiers in the Spoken BNC2014

Intensifiers undergo rapid changes. Old ones may go out of fashion and be replaced by new ones even in a short diachronic perspective. They should therefore be studied in up-to-date spoken material. This project will describe ‘new’ intensifiers (or new developments of intensifiers) such as so (cool), fucking, damn, dead, enough and the contexts in which they are used. What do they for example collocate with? Who are the typical users?

The aim of the article using data from the Spoken BNC2014 early access subset is to study recent or on-going changes in the area of intensification. Intensifiers are interesting to study because they have a tendency to lose ground and may be replaced by other intensifiers even in a short diachronic perspective. Intensifiers have earlier been studied on the basis of the spoken part of the British National Corpus, and access to the EAS will make it possible to compare the frequencies of intensifiers across time. On the basis of the corpus data it will also be possible to give information about the speakers (e.g. whether they are teenagers or adults, gender and social class of the speakers).


Kazuki Hata, Yun Pan and Steve Walsh

Newcastle University, UK

Talking the talk, walking the walk: interactional competence in and out

Our project aims to characterise interactional competence through a comparison of casual conversation and institutional talk, two distinct genres. The proposed study will build on an ongoing project using the NUCASE corpus (Newcastle University Corpus of Academic Spoken English), led by the School of Education, Communication and Language Sciences, Newcastle University. From our analysis of the NUCASE data, we have identified specific features of interactional competence which operate in different academic contexts. Interactional competence, across a range of academic disciplines, can be characterised by identifying the key linguistic and interactional features, which promote engagement and maximise ‘learning’ and ‘learning opportunities’.

The proposed study would extend findings from the NUCASE study by comparing two corpora, and by highlighting the ways in which interactional competence operates in both formal and informal settings. We see the Spoken BNC2014 early access subset as an ideal source to accomplish our research aim, due to its geographical and functional features, offering a unique opportunity to study speakers’ interactional competence in different settings, with a particular focus on the ‘organising features’ of spoken interactions. We anticipate that the proposed study would bring into question some of the recent claims from functional/interactional linguistic studies, regarding the textual and interpersonal functions of several tokens, and provide a better understanding of the context-shaped/renewing nature of discourse across interactional contexts.


Laura Paterson

Lancaster University, UK

‘You can just give those documents to myself’: Untriggered reflexive pronouns in 21st century spoken British English

Reflexive pronouns (myself, herself, etc.) must share reference with another grammatical unit in order to fulfil their syntactic criteria: in the sentence ‘The cat washes herself’, the noun phrase the cat and the reflexive pronoun herself represent the same entity and share a syntactic bond. However, despite syntactic constraints, reflexive pronouns occur without coreferent NPs in some varieties of English. In ‘You can just give those documents to myself’, the pronoun you and the reflexive pronoun myself cannot be coreferent and have different real-world referents. Reflexives occurring without coreferent noun phrases are classed as ‘untriggered’ and have traditionally been deemed ungrammatical. However, untriggered reflexives can be understood.

Using the Spoken BNC2014 early access subset, I will investigate the use of untriggered reflexives in 21st century spoken British English, asking:

  1. Do untriggered reflexives occur in particular syntactic positions?
  1. Does the use of untriggered reflexives correlate with use of a particular grammatical person?
  1. Does the use of untriggered reflexives correlate with particular demographic groups?
  1. How does the use of untriggered reflexives compare with the use of reflexives in 21st century spoken British English?

Check back soon for Part 4!

The Spoken BNC2014 early access projects: Part 2

In January, we announced the recipients of the Spoken BNC2014 Early Access Data Grants. Over the next several months, they will use exclusive access to the first five million words of Spoken BNC2014 data to carry out a total of thirteen research projects.

In this series of blogs, we are excited to share more information about these projects, in the words of their authors.

In Part 2 of our series, read about the work of Chris Ryder et al., Andreea Calude and Barbara McGillivray et al.


Chris Ryder, Jacqueline Laws and Sylvia Jaworska

University of Reading, UK

From oldies to selfies: A diachronic corpus-based study into changing productivity patterns in British English suffixation

The data from the Spoken BNC2014 early access subset will provide a unique opportunity to examine changes that have occurred in affix use in spoken British English over a twenty-year period; for example, the word selfie has only entered general usage since the invention of the iPhone. Using the recently developed MorphoQuantics database containing complex word data for 222 word-final affixes from the demographically sampled subset of the original Spoken BNC, direct comparisons can be made between old and new datasets, focussing on suffixation patterns, changes in productivity, and trends that demonstrate the shifts in semantic scope of individual suffixes. These features will be analysed chiefly through an examination (both quantitative and qualitative) of neologisms within the data, specifically regarding their regularity of construction, occurrence, and meaning.

This study is just one example of the diachronic morphological analyses that will be made available through a comparison of the Spoken BNC2014 EAS and the Spoken BNC, by utilising the categorisation system provided by MorphoQuantics.


Andreea Calude

University of Waikato, New Zealand

Sociolinguistic variation in cleft constructions: a quantitative corpus study of spontaneous conversation

This project concerns links between the use of various grammatical constructions and sociolinguistic variation, for example is grammar used differently by men and women, or by younger and older speakers? We know that such variation can be observed for certain phonological features (e.g., some vowel sounds) and for certain pragmatic constructions (e.g., discourse markers and new and given information), but as regards grammar features, the answer remains largely unknown or at best vague.

I intend to use the Spoken BNC2014 early access subset to investigate cleft constructions from a sociolinguistic variationist perspective, with the aim of uncovering (potential) systematic syntactic variation across age, gender, dialect, and socio-economic status. Clefts constitute the most frequently used focusing strategy in English, with demonstrative clefts being among the most common in spontaneous conversation, for example: “That is what I want to study”, “This is where I was born”. Despite intense diachronic and synchronic study of the structure and function of clefts in English, virtually nothing is known about the relationship between clefts use and sociolinguistic variation.

The Spoken BNC2014 data will be coded for all demonstrative clefts using a combination of manual and automatic detection, and each construction identified will be attributed to a particular speaker profile (in terms of their sociolinguistic features). Three linguistic features will also be coded for each construction, namely discourse function, reference direction (cataphoric or anaphoric), and information structure (amount of new and given information included).  The data will be analysed using a mixed effects generalised linear regression model.


Barbara McGillivray1, Gard Buen Jenset1 and Michael Rundell2

1University of Oxford, UK

2Lexicography MasterClass, UK

The dative alternation revisited: fresh insights from contemporary spoken data

A well-known feature of English grammar is the dative alternation, whereby a verb may be used in an SVOO construction (Give me the money) or in the pattern SVO followed by a PP with the preposition to (Give the money to me). This is quite a well-researched topic, and generalizations have been made about the factors influencing a writer’s choice of one construction or another, and about which verbs show a preference for one of these patterns over the other. However, most of the studies published to date draw either on introspection or on data from written sources. The availability of contemporary, unscripted spoken data takes us into new territory, and offers an exciting opportunity to revisit this topic.

Our plan is to use the data from the Early Access Scheme to investigate verbs whose argument structure preferences include the dative alternation. Once we have all the relevant corpus data from the Spoken BNC2014 early access subset, we will analyse it using state-of-the-art multivariate statistical techniques, in order to account for the interplay of all the potentially significant variables, whether lexical, semantic, syntactic, or and social. The proposed study thus exploits many of the unique features of this dataset, including the metadata on speakers and the USAS semantic tagging, to answer questions concerning the possible influence of semantic categories, socio-economic factors, gender, dialect, age, as well as linguistic features on a speaker’s preferences. Once the study is complete, there would be opportunities for fresh comparative studies, either with the original Spoken BNC or with contemporary written data.


Check back soon for Part 3!

The Spoken BNC2014 early access projects: Part 1

In January, we announced the recipients of the Spoken BNC2014 Early Access Data Grants. Over the next several months, they will use exclusive access to the first five million words of Spoken BNC2014 data to carry out a total of thirteen research projects.

In this series of blogs, we are excited to share more information about these projects, in the words of their authors.

In Part 1 of our series, read about the work of Deanna Wong, Jonathan Culpeper and Robert Fuchs.


Deanna Wong

Macquarie University, Australia

Investigating British English backchannels in the Spoken BNC2014

Have you ever listened to someone listening? While we might expect that listeners are silent, it turns out that listeners have a lot to say. Mostly, this listener speech happens at the same time as when the speaker is talking, but listeners are not talking to interrupt the speaker. Instead, listeners signal to the speaker that they are paying attention, that they agree with what the speaker has to say, and sometimes, that they are ready to have their turn at talking. The words that listeners use to signal these things can range from a simple mm to whole sentences. To make things even more interesting, how listeners listen varies across different parts of the world.

Sociolinguists use the term ‘backchannels’ to describe listener speech. Early research identified backchannels by careful investigations of individual conversations. That analysis took time, though, and it was not until researchers were able to access language corpora that we started to get a sense of the nature of backchannels in conversation on a larger scale.

However looking for evidence of backchannels in a corpus has its own challenges. If the actual language used by listeners is to be uncovered, we cannot assume that they take a specific form. Otherwise, we might miss something important! The key to unlocking this information is to use corpus annotation. Annotation is simply a way of marking what is happening in the talk. For example, corpus annotation can be used to indicate who is speaking, and if they are speaking at the same time so that their speech overlaps.

In my investigation into the Spoken BNC2014 early access subset, I will be using annotation that marks overlapping speech to help identify potential backchannels in conversations from across the United Kingdom. The size of the corpus, and its accompanying information about its speakers will add to our understanding of how British speakers backchannel. It will also help us to compare their backchannels to those produced by speakers of English in other parts of the world.


Jonathan Culpeper

Lancaster University, UK

Politeness variation in England

The stereotype of British politeness is pervasive, and, moreover, it is usually linked to what people say. Take, as an example, this advice on British stereotypes for study abroad students:

The way that British people speak and the language that we use is also considered quite polite. The language that many people use, including lots of phrases like ‘please’, ‘thank you’, ‘pardon’ or ‘excuse me’ and ‘would you mind…’ certainly back this up […]

(http://www.your-study-abroad.com/2011/04/stereotypes-about-british-culture-%E2%80%93-how-true-are-they/)

In fact, the first item in the list, please, seems to be elevated by many English parents to the supernatural – the “magic word” for achieving successful requests. Similarly, in the earthly world of academia, a large number of studies have found evidence that present-day English politeness is often characterised by so-called “off-record” or “negative politeness” –  it’s all about being indirect, showing respect for others’ privacy, freedom from disturbance, and so on. In the example, the expressions ‘would you mind’, ‘pardon me’ and ‘excuse me’ all readily fit this function. To these, one might add could you […], seemingly, the most frequent way in which requests are performed in British English.

But is all this true? For starters, there’s a lingering concern that some British people may actually use other items, perhaps functional alternatives, just as or even more frequently. For instance, thank you is one expression, but what about ta or cheers? More substantially, for anybody living in the north of England, the idea of British indirectness does not entirely ring true. Indirectness is somewhat stand-offish and cold; not reflective of the much proclaimed northern warmth and friendliness. Consider this opinion (written by a Scotsman who has lived in various parts of Britain):

There is definitely a North/South divide when it comes to politeness. Having lived on the South coast of England and then Scotland, it is very noticeable that people are more friendly and polite the further North you go in Britain

(http://news.bbc.co.uk/1/hi/talking_point/759276.stm)

Politeness here is connected to friendliness. Maybe academia is orienting to a particular and different cultural stereotype of politeness, one based on a British southern perspective. Or maybe the idea of northern friendly politeness is a stereotype itself and has no basis in what people actually do.

This study sets out to examine these issues. I intend it to be a contribution to one of the newest sub-fields of linguistic pragmatics, variational pragmatics, which combines pragmatics and dialectology. One of the greatest impediments to doing such a study has been the lack of large quantities of spoken, especially conversational, data taken from across Britain. The Spoken British National Corpus 2014 early access subset offers a solution.


Robert Fuchs

University of Münster, Germany

Recent change in the sociolinguistics of intensifiers in British English

As social beings and speakers of a language we are extremely good at putting people into boxes – female and male, young and old, old-fashioned and hip. One of the many clues that allows us to make these (sometimes in fact unwarranted) assumptions is sociolinguistic variation. Whether and how women and men differ in how they speak, for example, is a hotly debated topic in- and outside of academia.

This study approaches this topic from two novel angles. The first is that several sociolinguistic factors, age, social class, gender of speaker, gender of interlocutor and others, are considered in interaction with each other. Secondly, the study also looks at change across time, from the 1990s to the 2010s. For example, given the change in attitudes concerning what roles women and men are supposed to fulfil in society, I expect that any gender differences present in the 1990s will have decreased by the 2010s. The variable that the study investigates is the usage of so-called intensifiers (as in *very* good, *so* cool), which are said to occur more frequently in female than male speech.


Check back soon for Part 2!

What’s wrong with “a bunch of migrants”? Looking at the linguistic evidence

This week at Prime Minister’s Questions, David Cameron used the term “a bunch of migrants to describe refugees at a camp in Calais. He was subsequently criticised by Labour MPs and members of the general public on Twitter, and the story was reported on in mainstream newspapers like the Guardian and the Telegraph. Critics described his comments as “dehumanising”, “callous” and “inflammatory”.

Something about David Cameron saying the words “bunch of” to describe a group of people caused a furore – but what was it? Is this how people normally use this phrase, or is this a noteworthy departure from the norm?

Here at CASS we have the unique opportunity to analyse a very large set of everyday conversations between speakers of British English from all over the UK, which participants have been recording in their homes and sending to us to be transcribed. Using the transcriptions, we can use computer software to analyse how words and phrases are used commonly across the entire country.

I searched through 4.5 million words of present day conversation to find out how people in the UK normally use the phrase “bunch of”. I found that “people”, “flowers” and “things” are the most likely words to be described in this way. Beyond this, there are several other words which refer to groups of people:

“kids”, “volunteers”, “retards”, “losers”, “lads”, “individuals”, “friends”, “dickheads”, “dancers”, “Aussies”, “alcoholics”, “thieving sods” and “thieving fuckers”.

Absent from this list is the word “migrants”, which does not occur in this context. The evidence suggests that people do often use “bunch of” to describe groups of people negatively or with distaste. Therefore the upset caused by Cameron’s use of the phrase “a bunch of migrants” is perhaps understandable.

We are still collecting recordings from speakers all over the UK. For information on how to contribute to this project, which is led by Lancaster University and Cambridge University Press, please visit the Spoken BNC2014 website.

Spoken BNC2014 Early Access Data Grant Scheme – winning proposals

Lancaster University’s ESRC funded Centre for Corpus Approaches to Social Science (CASS) and Cambridge University Press are pleased to announce the recipients of the Spoken BNC2014 Early Access Data Grants. These successful applicants will receive exclusive early access to approximately five million words of the Spoken BNC2014 via CQPweb. They will be the first to conduct research using the data and produce papers to be published in 2017, coinciding with the release of the full corpus.

The successful applicants, their institutions, and the research they intend to undertake, are:

 

Karin Aijmer

Gothenburg

Investigating intensifiers in the Spoken BNC2014

 

Karin Axelsson

Gothenburg

Canonical and non-canonical tag questions in the Spoken BNC2014: What has happened since the original BNC?

 

Andrew Caines1, Michael McCarthy2 and Paula Buttery1

1Cambridge, 2Nottingham

‘You still talking to me?’ The zero auxiliary progressive in spoken British English, twenty years on

 

Andreea Simona Calude

Waikato

Sociolinguistic Variation in Cleft Constructions – a quantitative corpus study of spontaneous conversation

 

Jonathan Culpeper

Lancaster

Politeness variation in England

 

Robert Fuchs

Münster

Recent Change in the sociolinguistics of intensifiers in British English

 

Kazuki Hata, Yun Pan and Steve Walsh

Newcastle

Talking the talk, walking the walk: interactional competence in and out

 

Tanja Hessner and Ira Gawlitzek

Mannheim

Women speak in an emotional manner; men show their authority through speech! – A corpus-based study on linguistic differences showing which gender clichés are (still) true by analysing boosters in the Spoken BNC2014

 

Barbara McGillivray1, Jenset Gard1 and Michael Rundell2

1Oxford, 2Lexicography MasterClass

The dative alternation revisited: fresh insights from contemporary spoken data

 

Laura Paterson

Lancaster

‘You can just give those documents to myself’:  Untriggered reflexive pronouns in 21st century spoken British English

 

Chris Ryder, Jacqueline Laws and Sylvia Jaworska

Reading

From oldies to selfies: A diachronic corpus-based study into changing productivity patterns in British English suffixation

 

Tanja Säily1, Victoria González-Díaz2 and Jukka Suomela3

1Helsinki, 2Liverpool, 3Aalto

Variation in the productivity of adjective comparison

 

Deanna Wong

Macquarie

Investigating British English backchannels in the Spoken BNC2014

 

Thank you to everyone who applied, and congratulations to the winning proposals. Check back soon for more details on the Early Access Data Grant Scheme research.

 

Spoken BNC2014 meets FOLK

On Thursday 3rd December I visited the Institut für Deutsche Sprache (Institute for German Language) in Mannheim. The IDS is Germany’s national, non-university institution for the research and documentation of the German language in both the present day and the past.

I was thrilled to be invited there by Swantje Westpfahl, a PhD student at the Institute, who is working on the compilation of a large spoken corpus of German known as the FOLK (Forschungs- und Lehrkorpus Gesprochenes Deutsch; research and teaching corpus of spoken German). With the similarities between FOLK and the Spoken BNC2014 (my own PhD research project) apparent, we spent a day at the IDS learning about each other’s work.

In the morning, I gave an hour-long talk about the Spoken BNC2014, including an overview of our data collection and transcription methods as well as an investigation into speaker identification which I conducted earlier this year. I explained that, with a small budget, we (CASS and our partner Cambridge University Press) have very much favoured size and speed of production over minute detail of transcription; a decision that has allowed us to have produced approximately 8 million words of orthographic transcription so far in only 18 months.

After lunch, I attended a workshop entitled “Spoken BNC2014 meets FOLK”, where Dr Thomas Schmidt gave an equivalent talk to my own about the FOLK project, followed by Swantje, whose specific focus is on the annotation of the transcribed corpus data. In terms of general design, the FOLK is fairly similar to the Spoken BNC2014; it contains transcripts of audio recordings held between speakers in a variety of settings. The major differences, as I learned, lie in the approach to transcription and the release of data. I learned about the incredible level of detail with which the FOLK recordings are transcribed, using Thomas’ own transcription software FOLKER. I was impressed by the affordances of this tool and the dedication to detail that was evident at the IDS, including the transcription of breathing, pauses measured to the millisecond and direct alignment to the (anonymized) audio recordings. All of this work takes a long time (on average, one hour of recording take 100 hours to prepare in this way!), and as such the FOLK is much smaller than the Spoken BNC2014 (1.3 million words after three years), but extremely rich in terms of potential for analysis.

The IDS was in turn impressed by the Spoken BNC2014’s approach to data collection, where we ‘crowd-source’ participants and invite them, through media engagement and other means, to make recordings using their smartphones in exchange for payment. I suggested that they might like to try putting out a press release about marmalade to see whether the German media respond in the same way that the British media did.

Overall, my visit to Mannheim was a fantastic opportunity to learn about the FOLK project and to have some really interesting discussions about the aims of spoken corpus linguistics, and I would like to thank all at the IDS for their hospitality. I look forward to seeing Swantje again when CASS hosts her in Lancaster for a research visit in the Spring next year.

Spoken BNC2014 Early Access Data Grant Scheme – Applications now open

Lancaster University’s ESRC funded Centre for Corpus Approaches to Social Science (CASS) and Cambridge University Press are excited to announce the Spoken British National Corpus 2014 Early Access Data Grant scheme.

Applications are now open for researchers at any level in the field of corpus linguistics and beyond to gain early access to a large subset of the Spoken BNC2014, which is currently being compiled and is due for release in late 2017. Successful applicants will write a paper based on their proposed research for exclusive publication (subject to peer review) in either a special issue of the International Journal of Corpus Linguistics or an edited collection.

We invite proposals for interesting and innovative research that would use approximately five million words of the upcoming Spoken BNC2014 as its primary source of data.

Successful applicants will gain access to the data via the CQPweb platform (cqpweb.lancs.ac.uk). Standard CQPweb functionality will be provided, including annotation (POS tagging, lemmatisation, semantic tagging) and with one new feature: the ability to search the corpus according to categories of speaker metadata such as gender, age, dialect and socio-economic status.

Proposals can approach the data from any theoretical angle, provided corpus methodologies are used and the research can be carried out within the affordances of CQPweb. Successful applicants will receive access to the data in February 2016 with a deadline for full paper submission in October 2016. Subject to peer review, papers will be published in one of the two Spoken BNC2014 launch publications in 2017 (a special issue of the International Journal of Corpus Linguistics has been agreed and a thematic edited collection is being planned).

This is a fantastic opportunity to work with the first very large, general corpus of informal British English conversation created since the original BNC more than twenty years ago. Successful applicants will get access to a large subset of the Spoken BNC2014 eighteen months before the full corpus is released, and will be the very first scholars to undertake and publish research based on this new dataset.

More details about the terms of the data grant scheme can be found in the application form. To apply, download and complete the application form and email it to Robbie Love (r.m.love@lancaster.ac.uk). The deadline for applications is Friday 11th December 2015.