The Spoken BNC2014 early access projects: Part 4

In January, we announced the recipients of the Spoken BNC2014 Early Access Data Grants. Over the next several months, they will use exclusive access to the first five million words of Spoken BNC2014 data to carry out a total of thirteen research projects.

In this series of blogs, we are excited to share more information about these projects, in the words of their authors.

In the fourth and final part of our series, read about the work of Tanja Hessner & Ira Gawlitzek, Karin Axelsson, Andrew Caines et al. and Tanja Säily et al.

Tanja Hessner and Ira Gawlitzek

University of Mannheim, Germany

Women speak in an emotional manner; men show their authority through speech! – A corpus-based study on linguistic differences showing which gender clichés are (still) true by analysing boosters in the Spoken BNC2014

Western world clichés claim that women are emotional and often exaggerate, which is reflected in their speech. In contrast, men’s language is said to be characterised by bluntness. Aiming to shed a bit more light on statements like these, this study is going to consider gender differences on the lexical level.

In order to discover if and, if so, to which extent there really is a difference between female and male speakers, the phenomena of boosters will be investigated in the Spoken BNC2014 early access subset. Boosters such as totally or absolutely are particularly appealing and suitable for analysing gender differences since they are extremely multifaceted and they are indicators not only of lively, but also of emotional and powerful speech. Not only are appropriate boosters investigated by using quantitative methods, but also by analysing the data in a qualitative way.

Karin Axelsson

University of Gothenburg, Sweden

Canonical and non-canonical tag questions in the Spoken BNC2014: What has happened since the original BNC?

What is happening to tag questions in British everyday conversation? Are canonical tag questions, where the form of the tag reflects that of the preceding clause (as in She won’t come, will she?), on the way out as the use of innit and other invariant tags is spreading? Who uses innit in 2014? The use of tag questions in the Spoken BNC2014 early access subset will be compared to the use in the demographic part of the original Spoken BNC reflecting the language of the early 1990s.

Andrew Caines1, Michael McCarthy2 and Paula Buttery1

1University of Cambridge, UK

2University of Nottingham, UK

‘You still talking to me?’ The zero auxiliary progressive in spoken British English, twenty years on

With early access to a subset of the Spoken BNC2014, we will be able to assess whether a supposedly ‘ungrammatical’ construction has become more frequently used in conversational British English over the past 20 years. The construction in question is the ‘zero auxiliary’ – for example, the progressive aspect construction may be used with an -ing verb form alone (“you talking to me?”, “What you doing?”, “We going to town”) whereas the standard rule is to combine an auxiliary verb (BE or HAVE) with the -ing form.

In the original Spoken BNC recorded in the early 1990s, the zero auxiliary occurred in one-in-twenty progressive constructions, a rate that rose to one-in-three if second person interrogatives (You talking to me? etc.) were considered alone. Moreover, younger working-class speakers were more likely to use the zero auxiliary than older middle-class speakers. We will investigate how these usage rates compare to the Spoken BNC2014, in the process updating the demographics of zero auxiliary use as well.

Tanja Säily1, Victoria González-Díaz2 and Jukka Suomela3

1University of Helsinki, Finland

2University of Liverpool, UK

3Aalto University, Finland

Variation in the productivity of adjective comparison

The functional competition between inflectional (‑er) and periphrastic (more) comparative strategies in English has received a great deal of attention in corpus-based research. A key area of competition remains relatively unexplored, however: the productivity of either comparative strategy, or how diversely they are used with different adjectives. The received wisdom is that inflection is fully productive, so we might expect to find no variation within the productivity of ‑er. However, recent research using new methods shows sociolinguistic variation in the productivity of extremely productive derivational suffixes. Whether the same variation applies to the productivity of inflectional processes remains an open question.

On the basis of the Spoken BNC2014 early access subset, our project will analyse intra- and extra-linguistic variation in the productivity of inflectional and periphrastic comparative strategies. Intra-linguistic factors include syntactic position, modification preferences, length and derivational type of the adjective. The extra-linguistic determinants focus on gender, age, socio-economic status, conversational setting and roles of the interlocutors. Our research constitutes a timely contribution to current knowledge of adjective comparison and morphological theory-building. If (a) variation in the productivity of inflectional comparison is found and (b) similar change in the productivity of both derivational and inflectional processes is observed, this will support our hypothesis that there is a derivation-to-inflection cline rather than a sharp divide.

Check back soon for more updates on the Spoken BNC2014 project!

Reflections from the CASS student challenge panel member, part 1

Each year, one student from an outside institution is appointed to ‘challenge‘ CASS with concepts from their own novel research. Pamela Irwin, the 2013/2014 student challenge panel member, is beginning to wrap up her ‘term’, and has put together a series of reflections on the process. Read the first entry below.

I am a mature student with a background in health and higher education, and currently completing my PhD in gerontology. My research centres on the interaction between age, gender and the community in the context of resilience in older women living on their own in rural Australia.

Although ageing is informed by many disciplines, my research route is via the broad domain of social sciences. Serendipitously, a peer review of a journal article was responsible for my formal exposure to linguistics and corpus linguistics. The reviewers indicated that my paper reflected a sociological rather than the requisite social psychology orientation, and while I was aware that my topic crossed these disciplines, I was not fully cognisant of the critical importance of language in differentiating these subtleties. As a result, I enrolled in a corpus linguistic programme designed to improve academic language use, and through the inaugural CASS summer school, I was then able to consolidate, expand and apply this knowledge. This immersion in the world of linguistics stimulated a new and growing interest in the ‘function’ of language in academia and everyday life.

However I soon realised that my grounding in the grammatical structures of the English language was extremely basic. While I could identify the fundamental parts of speech, I could not parse a sentence and any further analysis was well beyond my skill set. Since then, I have been introduced to new concepts (semiosis), terminology (concatenate), techniques (linguistic ‘friendly’ transcribing) and technology (WMatix) amongst others, as well as being challenged to rethink and change some of my preconceived ideas (metaphor).

Here, my understanding of the figures of speech is particularly salient. Resilience, a key theme in my research, tends to have different meanings depending on both the subject and context. An overview of the literature suggests that resilience is often described metaphorically as ‘bouncing back’ in academic and popular psychology, whereas in an Australian setting, resilience is more likely to be associated with an image of ‘the (little) Aussie battler’ (Moore, 2010). In this context, resilience represents perseverance, with the ‘underdog’ battling against all odds to overcome hardship in adverse conditions. By contrast, at a systems (socio-ecological) level, resilience is not yet related to a specific metaphor or image. It is however, closely linked to a related term, ‘panarchy’, that involves a dynamic process of adaptation and transformation.

Thus resilience is defined by a metaphor (a ball), an image (a battler) and a conceptual term (panarchy) in my study. These differences provide a rich ‘landscape’ to uncover with corpus linguistics.


Moore, B. (2010). What’s their story? A history of Australian words. Melbourne: Oxford University Press.

Return soon to read Pamela’s next installment! Are you interested in becoming the next student challenge panel member? Apply to attend our free summer school to learn more.

Using Corpora to Analyze Gender

ucagI wrote UCAG during a sabbatical as a semi-sequel to a book I published in 2006 called Using Corpora for Discourse Analysis. Part of the reason for the second book was to update and expand some of my thinking around discourse- or social-related corpus linguistics. As time has passed, I haven’t become disenamoured of corpus methods, but I have become more reflective and critical of them and I wanted to use the book to highlight what they can and can’t do, and how researchers need to be guarded against using tools which might send them down a particular analytical path with a set of pre-ordained answers. Part of this has involved reflecting on how interpretations and explanations of corpus findings often need to come from outside the texts themselves (one of the tenets of critical discourse analysis), and subsequently whether a corpus approach requires analysts to go further and critically evaluate their findings in terms of “who benefits”.

Another way in which my thinking around corpus linguistics has developed since 2006 is in considering the advantages of methodological triangulation (or approaching a research project in multiple ways). In one analysis chapter I take three small corpora of adverts from Craigslist and try out three methods of attempting to uncover something interesting about gender from them – one very broad involving an automated tagging of every word, one semi-automatic relying on a focus on a smaller set of words, and another much more qualitative, relying on looking at concordance lines only. In another chapter I look at “difficult” search terms – comparing two methods of finding all the cases where a lecturer indicates that a student has given an incorrect answer in a corpus of academic-related speech. Would it be better to just read the whole corpus from start to finish, or is it possible to devise search terms so concordancing would elicit pretty much the same set?

The book also gave me a chance to revisit older data, particularly a set of newspaper articles about gay people from the Daily Mail which I had first looked at in Public Discourses of Gay Men (2005). As a replication experiment I revisited that data and redid an analysis I had first carried out about 10 years ago. While the idea of an objective researcher is fictional, corpus methods have aimed to redress the issue of researcher bias to an extent – although in retreading my steps, I did not obtain exactly the same results. Fortunately, the overall outcome was the same, but there were a few important points that the 10 years younger version of me missed. Does that matter? I suspect it doesn’t invalidate the analysis although it is a useful reminder about how our own analytical abilities alter over time.

Part of the reason for writing the book was to address other researchers who are either from corpus linguistics and want to look at gender, or who do research in gender and want to use corpus methods. I sometimes feel that these two groups of people do not talk to each other very much and as a result the corpus research in this area is often based around the “gender differences” paradigm where the focus is on how men and women apparently differ from each other in language use (with attendant metaphors about Mars and Venus). Chapters 2 and to an extent 3, address this by trying a number of experiments to see just how much lexical variation there is in sets of spoken corpora of male and female language – and when difference is found, how can it be explained? I also warn against lumping all men together into a box to compare them with all women who are put in a second box. The variation within the boxes can actually be the more interesting story to tell and this is where corpus tools around dispersion can really come into their own. So even if, for example, men do swear more than women, it’s not all men and not all the time. On the other hand, some differences which are more consistent and widespread can be incredibly revealing, although not in ways you might think – chapter 2 took me down an analytical path that ended up at the word Christmas – not perhaps an especially interesting word relating to gender, but it produced a lovely punchline to the chapter.

It was also good to introduce different corpora, tools and techniques that weren’t available in 2006. Mark Davies has an amazing set of online corpora, mostly based around American English, and I took the opportunity to use the COHA (Corpus of Historical American English) to track changes in language which reflects male bias over time, from the start of the 19th century to the present day. Another chapter utilises Adam Kilgariff’s online tool Sketch Engine which allows collocates to be calculated in terms of their grammatical relationships to one another. This allowed for a comparison of the terms boy and girl which allowed me to consider verbs that positioned either as subject or object. So girls are more likely to be impressed while boys are more likely to be outperformed. On the other hand boys cry whereas girls scream.

It would be great if the book inspired other researchers to consider the potential of using corpora in discourse/social related subjects as well as showing how this potential has expanded in recent years. It’s been fun to explore a relatively unexplored field (or rather travel a route between two connecting fields) but it occasionally gets lonely. I hope to encounter a few more people heading in the same direction as me in the coming years.

Discourse, Gender and Sexuality South-South Dialogues Conference

Last week was spent in at Witwatersrand (Wits) University in Johannesburg where I had been invited to give a workshop on corpus methods, as well as a talk on some of my own research. The week was topped off by the first Discourse, Gender and Sexuality South-South Dialogues Conference which was organised by Tommaso Milani. Many of the papers at the conference used qualitative methods (analyses of visual data seemed particularly popular) but there were a few papers, including my own, which used corpus methods.

These included a paper by Megan Edwards who combined a corpus approach with CDA and visual analysis to examine a small corpus of pamphlets found around Johannesburg – these pamphlets advertise remedies for sexual and relationship problems and Megan demonstrated that embedded within the adverts were gendered discourses – relating to notions of ideal masculinity and femininity. This is probably one of the few corpora in existence where the top lexical word is penis.

Another interesting paper was by Sally Hunt who examined corpora of articles about sex work in two South African newspapers, focussing on the period when SA hosted the World Cup. She found that while there was a more balanced set of representations of sex workers than expected, they were still largely represented as immoral and criminalised for their actions while the agency of their clients was largely obscured. Sally is a lecturer at Rhodes University, Grahamstown, and has recently completed the construction of a 1 million word South African corpus, using the Brown family sampling frame.

During the workshop that I hosted at the university I got participants to use AntConc to examine a small corpus of recent newspaper articles about feminists, and a number of interesting patterns emerged from the analyses of concordances and collocates that took place. For example, a representation of feminists as war-mongers or vocally annoying/fierce e.g. shrill, strident etc was very prevalent and perhaps expected, although we were surprised to see a sub-set of words which related feminists to Islam like feminist Taleban and feminist fatwas (killing two ideological birds with one stone). Additionally, it was interesting to see how these negative discourses shouldn’t always be taken at face value. They were sometimes quoted in order to be critical of them, although it was often only with expanded concordance lines that this could be seen. In all, a productive week, and it was good to meet so many people who were interested in finding out more about corpus linguistics.