Spoken BNC2014 Symposium

On the afternoon of Monday 26th June, CASS hosted a special symposium to celebrate the upcoming public launch of the Spoken British National Corpus 2014 – a corpus which members of CASS and Cambridge University Press have spent the last three years compiling.

More than fifty guests attended, representing a mixture of Lancaster Summer Schools participants, members of the CASS Challenge Panel, and those who travelled to Lancaster just for the day.

To kick off the symposium, CASS Centre Director Andrew Hardie said a few words about the history of Corpus Linguistics at Lancaster University, and put the compilation of a new BNC into context against previous developments in the field. He expressed his delight at the interest in the Spoken BNC2014 project as evidenced by the number of guests who were in attendance for the symposium.

I then gave the first talk alongside Claire Dembry (from Cambridge University Press) and Andrew Hardie, as representatives of the Spoken BNC2014 research team which also includes Vaclav Brezina and Tony McEnery. We discussed the main methodological decisions we made when thinking about the design, data collection, transcription and processing of the corpus. Andrew then gave a quick demonstration of the corpus in CQPweb, showing how features including speaker IDs, overlaps and attribution confidence are displayed in the interface.

Following our talk came the first of four research presentations, all of which used (the early access subset of) the Spoken BNC2014. The first of these was a talk by Karin Aijmer (University of Gothenburg) about the intensifier fucking, which went down very well with the audience. Karin’s Spoken BNC2014 research, which also includes other intensifiers, will be published as a chapter in Brezina et al. (forthcoming).

After a short break for refreshments, Jacqueline Laws (University of Reading) presented research into verb-forming suffixation which she had undertaken with Chris Ryder and Sylvia Jaworska. Comparing the demographically-sampled component of the Spoken BNC1994 to the new Spoken BNC2014, she found that females now appear to produce more neologisms (e.g. favouritize, popify) compared to males. Laws et al.’s research will be published in a forthcoming special issue of the International Journal of Corpus Linguistics.

Susan Reichelt (Lancaster University) was next to present her work on producing sociolinguistically comparable subsets of both the original and new Spoken British National Corpora. She highlighted a point which I had touched upon in my earlier talk: that the compilation of the Spoken BNC2014 sought to strike a balance between direct comparability with the original corpus on the one hand, and methodological improvement on the other. The areas where improvement was favoured over comparability (e.g. the classification of speaker socio-economic status) ought to be considered especially when thinking about sociolinguistic analysis. Susan’s work is associated with the recently announced CASS SDA project.

Finally, Jonathan Culpeper and Mathew Gillings (Lancaster University) presented their work on politeness variation between the north and south of England. They aimed to assess the extent to which commonly held stereotypes about differences between northern and southern politeness were reflected in language use in both the original and new corpora as a single dataset. Their work will be published as a chapter in Brezina et al. (forthcoming).

My reaction as the organiser of the symposium was that there is definitely a sense of anticipation about the release of the Spoken BNC2014, which is planned to take place in the autumn. Furthermore it was lovely to meet so many friendly and enthusiastic attendees. I am very grateful to each of the speakers for giving such interesting talks, and to all who attended – especially those who tweeted their reactions to the talks using the #BNC2014 hashtag! As one of my final duties as a member of CASS before moving onto pastures new, I am very glad that the symposium went as well as it did.

Spoken BNC2014 book announcement

We are excited to announce a forthcoming book which will be published as part of the BNC2014 logoRoutledge Advances in Corpus Linguistics series. “Corpus Approaches to Contemporary British Speech: Sociolinguistic Studies of the Spoken BNC2014” (edited by Vaclav Brezina, Robbie Love and Karin Aijmer) will feature a collection of research which is currently being undertaken by the recipients of the Spoken BNC2014 Early Access data grants.

With exclusive early access to approximately five million words of Spoken BNC2014 data, the book’s contributors will present a range of innovative studies which each analyse the corpus from a sociolinguistic perspective.

Following the public release of the complete Spoken BNC2014 (approximately ten million words) in late 2017, the book is anticipated to follow shortly thereafter. The agreement of the book with Routledge joins a previously announced special issue of the International Journal of Corpus Linguistics (IJCL), which will feature a range of work by other recipients of the Spoken BNC2014 Early Access data grants.

 

The Spoken BNC2014 early access projects: Part 4

In January, we announced the recipients of the Spoken BNC2014 Early Access Data Grants. Over the next several months, they will use exclusive access to the first five million words of Spoken BNC2014 data to carry out a total of thirteen research projects.

In this series of blogs, we are excited to share more information about these projects, in the words of their authors.

In the fourth and final part of our series, read about the work of Tanja Hessner & Ira Gawlitzek, Karin Axelsson, Andrew Caines et al. and Tanja Säily et al.


Tanja Hessner and Ira Gawlitzek

University of Mannheim, Germany

Women speak in an emotional manner; men show their authority through speech! – A corpus-based study on linguistic differences showing which gender clichés are (still) true by analysing boosters in the Spoken BNC2014

Western world clichés claim that women are emotional and often exaggerate, which is reflected in their speech. In contrast, men’s language is said to be characterised by bluntness. Aiming to shed a bit more light on statements like these, this study is going to consider gender differences on the lexical level.

In order to discover if and, if so, to which extent there really is a difference between female and male speakers, the phenomena of boosters will be investigated in the Spoken BNC2014 early access subset. Boosters such as totally or absolutely are particularly appealing and suitable for analysing gender differences since they are extremely multifaceted and they are indicators not only of lively, but also of emotional and powerful speech. Not only are appropriate boosters investigated by using quantitative methods, but also by analysing the data in a qualitative way.


Karin Axelsson

University of Gothenburg, Sweden

Canonical and non-canonical tag questions in the Spoken BNC2014: What has happened since the original BNC?

What is happening to tag questions in British everyday conversation? Are canonical tag questions, where the form of the tag reflects that of the preceding clause (as in She won’t come, will she?), on the way out as the use of innit and other invariant tags is spreading? Who uses innit in 2014? The use of tag questions in the Spoken BNC2014 early access subset will be compared to the use in the demographic part of the original Spoken BNC reflecting the language of the early 1990s.


Andrew Caines1, Michael McCarthy2 and Paula Buttery1

1University of Cambridge, UK

2University of Nottingham, UK

‘You still talking to me?’ The zero auxiliary progressive in spoken British English, twenty years on

With early access to a subset of the Spoken BNC2014, we will be able to assess whether a supposedly ‘ungrammatical’ construction has become more frequently used in conversational British English over the past 20 years. The construction in question is the ‘zero auxiliary’ – for example, the progressive aspect construction may be used with an -ing verb form alone (“you talking to me?”, “What you doing?”, “We going to town”) whereas the standard rule is to combine an auxiliary verb (BE or HAVE) with the -ing form.

In the original Spoken BNC recorded in the early 1990s, the zero auxiliary occurred in one-in-twenty progressive constructions, a rate that rose to one-in-three if second person interrogatives (You talking to me? etc.) were considered alone. Moreover, younger working-class speakers were more likely to use the zero auxiliary than older middle-class speakers. We will investigate how these usage rates compare to the Spoken BNC2014, in the process updating the demographics of zero auxiliary use as well.


Tanja Säily1, Victoria González-Díaz2 and Jukka Suomela3

1University of Helsinki, Finland

2University of Liverpool, UK

3Aalto University, Finland

Variation in the productivity of adjective comparison

The functional competition between inflectional (‑er) and periphrastic (more) comparative strategies in English has received a great deal of attention in corpus-based research. A key area of competition remains relatively unexplored, however: the productivity of either comparative strategy, or how diversely they are used with different adjectives. The received wisdom is that inflection is fully productive, so we might expect to find no variation within the productivity of ‑er. However, recent research using new methods shows sociolinguistic variation in the productivity of extremely productive derivational suffixes. Whether the same variation applies to the productivity of inflectional processes remains an open question.

On the basis of the Spoken BNC2014 early access subset, our project will analyse intra- and extra-linguistic variation in the productivity of inflectional and periphrastic comparative strategies. Intra-linguistic factors include syntactic position, modification preferences, length and derivational type of the adjective. The extra-linguistic determinants focus on gender, age, socio-economic status, conversational setting and roles of the interlocutors. Our research constitutes a timely contribution to current knowledge of adjective comparison and morphological theory-building. If (a) variation in the productivity of inflectional comparison is found and (b) similar change in the productivity of both derivational and inflectional processes is observed, this will support our hypothesis that there is a derivation-to-inflection cline rather than a sharp divide.


Check back soon for more updates on the Spoken BNC2014 project!

The Spoken BNC2014 early access projects: Part 3

In January, we announced the recipients of the Spoken BNC2014 Early Access Data Grants. Over the next several months, they will use exclusive access to the first five million words of Spoken BNC2014 data to carry out a total of thirteen research projects.

In this series of blogs, we are excited to share more information about these projects, in the words of their authors.

In Part 3 of our series, read about the work of Karin Aijmer, Kazuki Hata et al. and Laura Paterson.


Karin Aijmer

University of Gothenburg, Sweden

Investigating intensifiers in the Spoken BNC2014

Intensifiers undergo rapid changes. Old ones may go out of fashion and be replaced by new ones even in a short diachronic perspective. They should therefore be studied in up-to-date spoken material. This project will describe ‘new’ intensifiers (or new developments of intensifiers) such as so (cool), fucking, damn, dead, enough and the contexts in which they are used. What do they for example collocate with? Who are the typical users?

The aim of the article using data from the Spoken BNC2014 early access subset is to study recent or on-going changes in the area of intensification. Intensifiers are interesting to study because they have a tendency to lose ground and may be replaced by other intensifiers even in a short diachronic perspective. Intensifiers have earlier been studied on the basis of the spoken part of the British National Corpus, and access to the EAS will make it possible to compare the frequencies of intensifiers across time. On the basis of the corpus data it will also be possible to give information about the speakers (e.g. whether they are teenagers or adults, gender and social class of the speakers).


Kazuki Hata, Yun Pan and Steve Walsh

Newcastle University, UK

Talking the talk, walking the walk: interactional competence in and out

Our project aims to characterise interactional competence through a comparison of casual conversation and institutional talk, two distinct genres. The proposed study will build on an ongoing project using the NUCASE corpus (Newcastle University Corpus of Academic Spoken English), led by the School of Education, Communication and Language Sciences, Newcastle University. From our analysis of the NUCASE data, we have identified specific features of interactional competence which operate in different academic contexts. Interactional competence, across a range of academic disciplines, can be characterised by identifying the key linguistic and interactional features, which promote engagement and maximise ‘learning’ and ‘learning opportunities’.

The proposed study would extend findings from the NUCASE study by comparing two corpora, and by highlighting the ways in which interactional competence operates in both formal and informal settings. We see the Spoken BNC2014 early access subset as an ideal source to accomplish our research aim, due to its geographical and functional features, offering a unique opportunity to study speakers’ interactional competence in different settings, with a particular focus on the ‘organising features’ of spoken interactions. We anticipate that the proposed study would bring into question some of the recent claims from functional/interactional linguistic studies, regarding the textual and interpersonal functions of several tokens, and provide a better understanding of the context-shaped/renewing nature of discourse across interactional contexts.


Laura Paterson

Lancaster University, UK

‘You can just give those documents to myself’: Untriggered reflexive pronouns in 21st century spoken British English

Reflexive pronouns (myself, herself, etc.) must share reference with another grammatical unit in order to fulfil their syntactic criteria: in the sentence ‘The cat washes herself’, the noun phrase the cat and the reflexive pronoun herself represent the same entity and share a syntactic bond. However, despite syntactic constraints, reflexive pronouns occur without coreferent NPs in some varieties of English. In ‘You can just give those documents to myself’, the pronoun you and the reflexive pronoun myself cannot be coreferent and have different real-world referents. Reflexives occurring without coreferent noun phrases are classed as ‘untriggered’ and have traditionally been deemed ungrammatical. However, untriggered reflexives can be understood.

Using the Spoken BNC2014 early access subset, I will investigate the use of untriggered reflexives in 21st century spoken British English, asking:

  1. Do untriggered reflexives occur in particular syntactic positions?
  1. Does the use of untriggered reflexives correlate with use of a particular grammatical person?
  1. Does the use of untriggered reflexives correlate with particular demographic groups?
  1. How does the use of untriggered reflexives compare with the use of reflexives in 21st century spoken British English?

Check back soon for Part 4!

The Spoken BNC2014 early access projects: Part 2

In January, we announced the recipients of the Spoken BNC2014 Early Access Data Grants. Over the next several months, they will use exclusive access to the first five million words of Spoken BNC2014 data to carry out a total of thirteen research projects.

In this series of blogs, we are excited to share more information about these projects, in the words of their authors.

In Part 2 of our series, read about the work of Chris Ryder et al., Andreea Calude and Barbara McGillivray et al.


Chris Ryder, Jacqueline Laws and Sylvia Jaworska

University of Reading, UK

From oldies to selfies: A diachronic corpus-based study into changing productivity patterns in British English suffixation

The data from the Spoken BNC2014 early access subset will provide a unique opportunity to examine changes that have occurred in affix use in spoken British English over a twenty-year period; for example, the word selfie has only entered general usage since the invention of the iPhone. Using the recently developed MorphoQuantics database containing complex word data for 222 word-final affixes from the demographically sampled subset of the original Spoken BNC, direct comparisons can be made between old and new datasets, focussing on suffixation patterns, changes in productivity, and trends that demonstrate the shifts in semantic scope of individual suffixes. These features will be analysed chiefly through an examination (both quantitative and qualitative) of neologisms within the data, specifically regarding their regularity of construction, occurrence, and meaning.

This study is just one example of the diachronic morphological analyses that will be made available through a comparison of the Spoken BNC2014 EAS and the Spoken BNC, by utilising the categorisation system provided by MorphoQuantics.


Andreea Calude

University of Waikato, New Zealand

Sociolinguistic variation in cleft constructions: a quantitative corpus study of spontaneous conversation

This project concerns links between the use of various grammatical constructions and sociolinguistic variation, for example is grammar used differently by men and women, or by younger and older speakers? We know that such variation can be observed for certain phonological features (e.g., some vowel sounds) and for certain pragmatic constructions (e.g., discourse markers and new and given information), but as regards grammar features, the answer remains largely unknown or at best vague.

I intend to use the Spoken BNC2014 early access subset to investigate cleft constructions from a sociolinguistic variationist perspective, with the aim of uncovering (potential) systematic syntactic variation across age, gender, dialect, and socio-economic status. Clefts constitute the most frequently used focusing strategy in English, with demonstrative clefts being among the most common in spontaneous conversation, for example: “That is what I want to study”, “This is where I was born”. Despite intense diachronic and synchronic study of the structure and function of clefts in English, virtually nothing is known about the relationship between clefts use and sociolinguistic variation.

The Spoken BNC2014 data will be coded for all demonstrative clefts using a combination of manual and automatic detection, and each construction identified will be attributed to a particular speaker profile (in terms of their sociolinguistic features). Three linguistic features will also be coded for each construction, namely discourse function, reference direction (cataphoric or anaphoric), and information structure (amount of new and given information included).  The data will be analysed using a mixed effects generalised linear regression model.


Barbara McGillivray1, Gard Buen Jenset1 and Michael Rundell2

1University of Oxford, UK

2Lexicography MasterClass, UK

The dative alternation revisited: fresh insights from contemporary spoken data

A well-known feature of English grammar is the dative alternation, whereby a verb may be used in an SVOO construction (Give me the money) or in the pattern SVO followed by a PP with the preposition to (Give the money to me). This is quite a well-researched topic, and generalizations have been made about the factors influencing a writer’s choice of one construction or another, and about which verbs show a preference for one of these patterns over the other. However, most of the studies published to date draw either on introspection or on data from written sources. The availability of contemporary, unscripted spoken data takes us into new territory, and offers an exciting opportunity to revisit this topic.

Our plan is to use the data from the Early Access Scheme to investigate verbs whose argument structure preferences include the dative alternation. Once we have all the relevant corpus data from the Spoken BNC2014 early access subset, we will analyse it using state-of-the-art multivariate statistical techniques, in order to account for the interplay of all the potentially significant variables, whether lexical, semantic, syntactic, or and social. The proposed study thus exploits many of the unique features of this dataset, including the metadata on speakers and the USAS semantic tagging, to answer questions concerning the possible influence of semantic categories, socio-economic factors, gender, dialect, age, as well as linguistic features on a speaker’s preferences. Once the study is complete, there would be opportunities for fresh comparative studies, either with the original Spoken BNC or with contemporary written data.


Check back soon for Part 3!

The Spoken BNC2014 early access projects: Part 1

In January, we announced the recipients of the Spoken BNC2014 Early Access Data Grants. Over the next several months, they will use exclusive access to the first five million words of Spoken BNC2014 data to carry out a total of thirteen research projects.

In this series of blogs, we are excited to share more information about these projects, in the words of their authors.

In Part 1 of our series, read about the work of Deanna Wong, Jonathan Culpeper and Robert Fuchs.


Deanna Wong

Macquarie University, Australia

Investigating British English backchannels in the Spoken BNC2014

Have you ever listened to someone listening? While we might expect that listeners are silent, it turns out that listeners have a lot to say. Mostly, this listener speech happens at the same time as when the speaker is talking, but listeners are not talking to interrupt the speaker. Instead, listeners signal to the speaker that they are paying attention, that they agree with what the speaker has to say, and sometimes, that they are ready to have their turn at talking. The words that listeners use to signal these things can range from a simple mm to whole sentences. To make things even more interesting, how listeners listen varies across different parts of the world.

Sociolinguists use the term ‘backchannels’ to describe listener speech. Early research identified backchannels by careful investigations of individual conversations. That analysis took time, though, and it was not until researchers were able to access language corpora that we started to get a sense of the nature of backchannels in conversation on a larger scale.

However looking for evidence of backchannels in a corpus has its own challenges. If the actual language used by listeners is to be uncovered, we cannot assume that they take a specific form. Otherwise, we might miss something important! The key to unlocking this information is to use corpus annotation. Annotation is simply a way of marking what is happening in the talk. For example, corpus annotation can be used to indicate who is speaking, and if they are speaking at the same time so that their speech overlaps.

In my investigation into the Spoken BNC2014 early access subset, I will be using annotation that marks overlapping speech to help identify potential backchannels in conversations from across the United Kingdom. The size of the corpus, and its accompanying information about its speakers will add to our understanding of how British speakers backchannel. It will also help us to compare their backchannels to those produced by speakers of English in other parts of the world.


Jonathan Culpeper

Lancaster University, UK

Politeness variation in England

The stereotype of British politeness is pervasive, and, moreover, it is usually linked to what people say. Take, as an example, this advice on British stereotypes for study abroad students:

The way that British people speak and the language that we use is also considered quite polite. The language that many people use, including lots of phrases like ‘please’, ‘thank you’, ‘pardon’ or ‘excuse me’ and ‘would you mind…’ certainly back this up […]

(http://www.your-study-abroad.com/2011/04/stereotypes-about-british-culture-%E2%80%93-how-true-are-they/)

In fact, the first item in the list, please, seems to be elevated by many English parents to the supernatural – the “magic word” for achieving successful requests. Similarly, in the earthly world of academia, a large number of studies have found evidence that present-day English politeness is often characterised by so-called “off-record” or “negative politeness” –  it’s all about being indirect, showing respect for others’ privacy, freedom from disturbance, and so on. In the example, the expressions ‘would you mind’, ‘pardon me’ and ‘excuse me’ all readily fit this function. To these, one might add could you […], seemingly, the most frequent way in which requests are performed in British English.

But is all this true? For starters, there’s a lingering concern that some British people may actually use other items, perhaps functional alternatives, just as or even more frequently. For instance, thank you is one expression, but what about ta or cheers? More substantially, for anybody living in the north of England, the idea of British indirectness does not entirely ring true. Indirectness is somewhat stand-offish and cold; not reflective of the much proclaimed northern warmth and friendliness. Consider this opinion (written by a Scotsman who has lived in various parts of Britain):

There is definitely a North/South divide when it comes to politeness. Having lived on the South coast of England and then Scotland, it is very noticeable that people are more friendly and polite the further North you go in Britain

(http://news.bbc.co.uk/1/hi/talking_point/759276.stm)

Politeness here is connected to friendliness. Maybe academia is orienting to a particular and different cultural stereotype of politeness, one based on a British southern perspective. Or maybe the idea of northern friendly politeness is a stereotype itself and has no basis in what people actually do.

This study sets out to examine these issues. I intend it to be a contribution to one of the newest sub-fields of linguistic pragmatics, variational pragmatics, which combines pragmatics and dialectology. One of the greatest impediments to doing such a study has been the lack of large quantities of spoken, especially conversational, data taken from across Britain. The Spoken British National Corpus 2014 early access subset offers a solution.


Robert Fuchs

University of Münster, Germany

Recent change in the sociolinguistics of intensifiers in British English

As social beings and speakers of a language we are extremely good at putting people into boxes – female and male, young and old, old-fashioned and hip. One of the many clues that allows us to make these (sometimes in fact unwarranted) assumptions is sociolinguistic variation. Whether and how women and men differ in how they speak, for example, is a hotly debated topic in- and outside of academia.

This study approaches this topic from two novel angles. The first is that several sociolinguistic factors, age, social class, gender of speaker, gender of interlocutor and others, are considered in interaction with each other. Secondly, the study also looks at change across time, from the 1990s to the 2010s. For example, given the change in attitudes concerning what roles women and men are supposed to fulfil in society, I expect that any gender differences present in the 1990s will have decreased by the 2010s. The variable that the study investigates is the usage of so-called intensifiers (as in *very* good, *so* cool), which are said to occur more frequently in female than male speech.


Check back soon for Part 2!

What’s wrong with “a bunch of migrants”? Looking at the linguistic evidence

This week at Prime Minister’s Questions, David Cameron used the term “a bunch of migrants to describe refugees at a camp in Calais. He was subsequently criticised by Labour MPs and members of the general public on Twitter, and the story was reported on in mainstream newspapers like the Guardian and the Telegraph. Critics described his comments as “dehumanising”, “callous” and “inflammatory”.

Something about David Cameron saying the words “bunch of” to describe a group of people caused a furore – but what was it? Is this how people normally use this phrase, or is this a noteworthy departure from the norm?

Here at CASS we have the unique opportunity to analyse a very large set of everyday conversations between speakers of British English from all over the UK, which participants have been recording in their homes and sending to us to be transcribed. Using the transcriptions, we can use computer software to analyse how words and phrases are used commonly across the entire country.

I searched through 4.5 million words of present day conversation to find out how people in the UK normally use the phrase “bunch of”. I found that “people”, “flowers” and “things” are the most likely words to be described in this way. Beyond this, there are several other words which refer to groups of people:

“kids”, “volunteers”, “retards”, “losers”, “lads”, “individuals”, “friends”, “dickheads”, “dancers”, “Aussies”, “alcoholics”, “thieving sods” and “thieving fuckers”.

Absent from this list is the word “migrants”, which does not occur in this context. The evidence suggests that people do often use “bunch of” to describe groups of people negatively or with distaste. Therefore the upset caused by Cameron’s use of the phrase “a bunch of migrants” is perhaps understandable.

We are still collecting recordings from speakers all over the UK. For information on how to contribute to this project, which is led by Lancaster University and Cambridge University Press, please visit the Spoken BNC2014 website.

Spoken BNC2014 Early Access Data Grant Scheme – winning proposals

Lancaster University’s ESRC funded Centre for Corpus Approaches to Social Science (CASS) and Cambridge University Press are pleased to announce the recipients of the Spoken BNC2014 Early Access Data Grants. These successful applicants will receive exclusive early access to approximately five million words of the Spoken BNC2014 via CQPweb. They will be the first to conduct research using the data and produce papers to be published in 2017, coinciding with the release of the full corpus.

The successful applicants, their institutions, and the research they intend to undertake, are:

 

Karin Aijmer

Gothenburg

Investigating intensifiers in the Spoken BNC2014

 

Karin Axelsson

Gothenburg

Canonical and non-canonical tag questions in the Spoken BNC2014: What has happened since the original BNC?

 

Andrew Caines1, Michael McCarthy2 and Paula Buttery1

1Cambridge, 2Nottingham

‘You still talking to me?’ The zero auxiliary progressive in spoken British English, twenty years on

 

Andreea Simona Calude

Waikato

Sociolinguistic Variation in Cleft Constructions – a quantitative corpus study of spontaneous conversation

 

Jonathan Culpeper

Lancaster

Politeness variation in England

 

Robert Fuchs

Münster

Recent Change in the sociolinguistics of intensifiers in British English

 

Kazuki Hata, Yun Pan and Steve Walsh

Newcastle

Talking the talk, walking the walk: interactional competence in and out

 

Tanja Hessner and Ira Gawlitzek

Mannheim

Women speak in an emotional manner; men show their authority through speech! – A corpus-based study on linguistic differences showing which gender clichés are (still) true by analysing boosters in the Spoken BNC2014

 

Barbara McGillivray1, Jenset Gard1 and Michael Rundell2

1Oxford, 2Lexicography MasterClass

The dative alternation revisited: fresh insights from contemporary spoken data

 

Laura Paterson

Lancaster

‘You can just give those documents to myself’:  Untriggered reflexive pronouns in 21st century spoken British English

 

Chris Ryder, Jacqueline Laws and Sylvia Jaworska

Reading

From oldies to selfies: A diachronic corpus-based study into changing productivity patterns in British English suffixation

 

Tanja Säily1, Victoria González-Díaz2 and Jukka Suomela3

1Helsinki, 2Liverpool, 3Aalto

Variation in the productivity of adjective comparison

 

Deanna Wong

Macquarie

Investigating British English backchannels in the Spoken BNC2014

 

Thank you to everyone who applied, and congratulations to the winning proposals. Check back soon for more details on the Early Access Data Grant Scheme research.

 

Spoken BNC2014 meets FOLK

On Thursday 3rd December I visited the Institut für Deutsche Sprache (Institute for German Language) in Mannheim. The IDS is Germany’s national, non-university institution for the research and documentation of the German language in both the present day and the past.

I was thrilled to be invited there by Swantje Westpfahl, a PhD student at the Institute, who is working on the compilation of a large spoken corpus of German known as the FOLK (Forschungs- und Lehrkorpus Gesprochenes Deutsch; research and teaching corpus of spoken German). With the similarities between FOLK and the Spoken BNC2014 (my own PhD research project) apparent, we spent a day at the IDS learning about each other’s work.

In the morning, I gave an hour-long talk about the Spoken BNC2014, including an overview of our data collection and transcription methods as well as an investigation into speaker identification which I conducted earlier this year. I explained that, with a small budget, we (CASS and our partner Cambridge University Press) have very much favoured size and speed of production over minute detail of transcription; a decision that has allowed us to have produced approximately 8 million words of orthographic transcription so far in only 18 months.

After lunch, I attended a workshop entitled “Spoken BNC2014 meets FOLK”, where Dr Thomas Schmidt gave an equivalent talk to my own about the FOLK project, followed by Swantje, whose specific focus is on the annotation of the transcribed corpus data. In terms of general design, the FOLK is fairly similar to the Spoken BNC2014; it contains transcripts of audio recordings held between speakers in a variety of settings. The major differences, as I learned, lie in the approach to transcription and the release of data. I learned about the incredible level of detail with which the FOLK recordings are transcribed, using Thomas’ own transcription software FOLKER. I was impressed by the affordances of this tool and the dedication to detail that was evident at the IDS, including the transcription of breathing, pauses measured to the millisecond and direct alignment to the (anonymized) audio recordings. All of this work takes a long time (on average, one hour of recording take 100 hours to prepare in this way!), and as such the FOLK is much smaller than the Spoken BNC2014 (1.3 million words after three years), but extremely rich in terms of potential for analysis.

The IDS was in turn impressed by the Spoken BNC2014’s approach to data collection, where we ‘crowd-source’ participants and invite them, through media engagement and other means, to make recordings using their smartphones in exchange for payment. I suggested that they might like to try putting out a press release about marmalade to see whether the German media respond in the same way that the British media did.

Overall, my visit to Mannheim was a fantastic opportunity to learn about the FOLK project and to have some really interesting discussions about the aims of spoken corpus linguistics, and I would like to thank all at the IDS for their hospitality. I look forward to seeing Swantje again when CASS hosts her in Lancaster for a research visit in the Spring next year.

Spoken BNC2014 Early Access Data Grant Scheme – Applications now open

Lancaster University’s ESRC funded Centre for Corpus Approaches to Social Science (CASS) and Cambridge University Press are excited to announce the Spoken British National Corpus 2014 Early Access Data Grant scheme.

Applications are now open for researchers at any level in the field of corpus linguistics and beyond to gain early access to a large subset of the Spoken BNC2014, which is currently being compiled and is due for release in late 2017. Successful applicants will write a paper based on their proposed research for exclusive publication (subject to peer review) in either a special issue of the International Journal of Corpus Linguistics or an edited collection.

We invite proposals for interesting and innovative research that would use approximately five million words of the upcoming Spoken BNC2014 as its primary source of data.

Successful applicants will gain access to the data via the CQPweb platform (cqpweb.lancs.ac.uk). Standard CQPweb functionality will be provided, including annotation (POS tagging, lemmatisation, semantic tagging) and with one new feature: the ability to search the corpus according to categories of speaker metadata such as gender, age, dialect and socio-economic status.

Proposals can approach the data from any theoretical angle, provided corpus methodologies are used and the research can be carried out within the affordances of CQPweb. Successful applicants will receive access to the data in February 2016 with a deadline for full paper submission in October 2016. Subject to peer review, papers will be published in one of the two Spoken BNC2014 launch publications in 2017 (a special issue of the International Journal of Corpus Linguistics has been agreed and a thematic edited collection is being planned).

This is a fantastic opportunity to work with the first very large, general corpus of informal British English conversation created since the original BNC more than twenty years ago. Successful applicants will get access to a large subset of the Spoken BNC2014 eighteen months before the full corpus is released, and will be the very first scholars to undertake and publish research based on this new dataset.

More details about the terms of the data grant scheme can be found in the application form. To apply, download and complete the application form and email it to Robbie Love (r.m.love@lancaster.ac.uk). The deadline for applications is Friday 11th December 2015.