Spoken BNC2014 Early Access Data Grant Scheme – Applications now open

Lancaster University’s ESRC funded Centre for Corpus Approaches to Social Science (CASS) and Cambridge University Press are excited to announce the Spoken British National Corpus 2014 Early Access Data Grant scheme.

Applications are now open for researchers at any level in the field of corpus linguistics and beyond to gain early access to a large subset of the Spoken BNC2014, which is currently being compiled and is due for release in late 2017. Successful applicants will write a paper based on their proposed research for exclusive publication (subject to peer review) in either a special issue of the International Journal of Corpus Linguistics or an edited collection.

We invite proposals for interesting and innovative research that would use approximately five million words of the upcoming Spoken BNC2014 as its primary source of data.

Successful applicants will gain access to the data via the CQPweb platform (cqpweb.lancs.ac.uk). Standard CQPweb functionality will be provided, including annotation (POS tagging, lemmatisation, semantic tagging) and with one new feature: the ability to search the corpus according to categories of speaker metadata such as gender, age, dialect and socio-economic status.

Proposals can approach the data from any theoretical angle, provided corpus methodologies are used and the research can be carried out within the affordances of CQPweb. Successful applicants will receive access to the data in February 2016 with a deadline for full paper submission in October 2016. Subject to peer review, papers will be published in one of the two Spoken BNC2014 launch publications in 2017 (a special issue of the International Journal of Corpus Linguistics has been agreed and a thematic edited collection is being planned).

This is a fantastic opportunity to work with the first very large, general corpus of informal British English conversation created since the original BNC more than twenty years ago. Successful applicants will get access to a large subset of the Spoken BNC2014 eighteen months before the full corpus is released, and will be the very first scholars to undertake and publish research based on this new dataset.

More details about the terms of the data grant scheme can be found in the application form. To apply, download and complete the application form and email it to Robbie Love (r.m.love(Replace this parenthesis with the @ sign)lancaster.ac.uk). The deadline for applications is Friday 11th December 2015.

Corpus compilation: working paper now available

We are pleased to announce that the CASS Corpus on Urban Violence in Brazil is now ready to be analysed. It contains a total of about 5,127 articles (1,778,282 words) published between Jan-Dec 2014 by four Brazilian newspapers: Folha de São Paulo, Estado de São Paulo, Zero Hora and Pioneiro.

This working paper explains the process of compiling the corpus. It describes the selection of sources and individual texts, preparation of the texts so that they can be processed by corpus linguistics techniques, and concludes with an overview of the corpus’ content.

Does it matter what pronoun you use?

Historically, in British English at least, if you didn’t know someone’s preferred gender it was considered grammatically correct to use he to refer to them, even if they might be female. Based on the justification that ‘the masculine includes the feminine’, this means that all of the following would be considered fine examples of English usage:

  • The driver in front is swerving like he is drunk.
  • A scientist is a fountain of knowledge; he should be respected.
  • Any student wishing to answer a question should raise his hand.
  • Everyone should consider his own family when choosing how to vote.

When you picture the people referred to in these scenarios, were any of them women? Or, to put it another way, were any of them any identity other than ‘male’? Evidence from psychological experiments has shown that the pronoun he (in all its forms) evokes a male image in the mind. Its use as a ‘generic’ pronoun, in contrast to what grammarians of old seemed to think, actually makes it harder to read and process sentences with stereotypically feminine referents (i.e. A childminder must wash his hands before feeding the children.).

So if you don’t want to go around assuming that all the world is male by default, what do you do? Luckily, there is a solution to this problem: if you don’t know a person’s gender identity, you can use the pronoun they to refer to them. There may be a mental screech of brakes here for those of you who were taught that they is a plural pronoun, but actually, it’s more versatile than that. Try using they for he in all of the sentences above. When thinking about the scientist or the driver, was there suddenly more than one? No. Indeed, singular they has been shown not to interfere with mental processing in the way that generic he does.  I used it in the first sentence of this post and I’ll bet you didn’t even notice it. (Go on. Check.)

For those of you still not convinced, the use of singular they is widespread in spoken and written English. It’s highly likely that you use the form yourself without even thinking about it. In British Pronoun Use, Prescription and Processing (Palgrave 2014) an analysis of this type of pronouns demonstrates that singular they is ubiquitous in British English. If you still need more convincing, here’s a link to an extremely favourable review of that study just published in Language and Society.

Registration open for free upcoming event: “Language matters: communication, culture and society”

CASS is excited to announce an upcoming event at the International Anthony Burgess Foundation in Manchester on Thursday 12th November from 4pm-9pm.

“Language matters: communication, culture and society” is a mini-series of four informal talks showcasing the impact of language on society. The timely themes will be presented in an approachable manner that will be accessible to a general audience, stimulating to novice language researchers, and interesting to social scientists. Topics include hate speech, myths about impoliteness, and online aggression. Each talk incorporates an element of social science research beyond linguistics and we will take this opportunity to emphasise the importance of interdisciplinary work.

Afterwards, the audience will be invited to a drinks reception, during which they will have the opportunity to engage further with speakers and to network with guests.

In a single event, participants will have the opportunity to hear renowned scholars talk about their lives, their work, and what they find most interesting about the relationship between language and society. Talks are short, energetic, and pitched for a general audience.


  • “Impoliteness: The language of offence” – Jonathan Culpeper
  • “Vile Words. What is the case for criminalizing everyday hate speech as hate crime?” – Paul Iganski
  • “The ethics of investigating online aggression: where does ‘virtual’ end and ‘reality’ begin?” – Claire Hardaker
  • “Spoken English in UK society” – Robbie Love

This free event is part of the ESRC Festival of Social Science 2015. Please register online to book your place.

For a taste of what’s in store, please see this video recap of a similar event held in London last year. For more information, please visit the ESRC website.

Welcome to our newest CASS PhD student!

It’s the start of a new academic year, and the offices of CASS continue to get busier and busier! This week we welcomed our newest PhD student, Ruth Byrne, to the team. Here’s a bit aout Ruth and her research, in her own words:

Ruth ByrneI’ve just begun the first year of my ESRC-funded PhD, and will be using the British Library’s 19th Century newspaper collection to explore historic attitudes to immigration. I completed my undergraduate and masters’ degrees within the History department at Lancaster.

I’ve always been an avid reader and thrived on close textual analysis. So, although my background has firm roots in History, and not Linguistics, the study of language has naturally woven its way through much of my research. The main focus of my undergraduate study was the shifting media language surrounding the struggle for Indian Independence. Without realising it, I effectively conducted a manual hunt for collocates within lines of concordance. Terms I was not to encounter until I heard about the work of CASS during my MA. Unaware of Corpus Linguistics as an approach, and of how it could have hugely increased my efficiency and rapidity, I was frequently frustrated at the laborious nature of the process which I had chosen to undertake.

Perhaps because I’ve found my own work and interests so hard to categorise, I’ve long been fascinated by the concept of interdisciplinary research. I was thrilled to find out that I’d be joining an experienced team who are pushing the boundaries of Corpus Linguistics as an interdisciplinary research tool, and that I’d be working at the intersection of two departments. I am keen to compare the challenges which face researchers working with corpora to those traditionally faced by historians working with large archives.

Some extra-academic trivia: I’m from a family of wine-merchants and spent most childhood holidays being dragged unwillingly around vineyards. As a result I’ve accumulated a lot of odd knowledge about grape varieties and whisky distilleries. When not working on my thesis, I’ll most likely be hiking up a hill in the Lake District.

MA students all pass with Distinction!

Myself, Róisín, and Gillian were delighted to find out last week that we all passed our MA Language and Linguistics degrees with Distinction. Our degree programme included taking a wide range of modules, followed by two terms spent researching and writing a 25,000 word dissertation. All three of us used this opportunity to conduct pilot or exploratory studies in preparation for our PhD studies, which we are excited to be commencing now! You can see the titles and abstracts of our dissertations below:

Abi Hawtin

Methodological issues in the compilation of written corpora: an exploratory study for Written BNC2014

The Centre for Corpus Approaches to Social Science (CASS) at Lancaster University and Cambridge University Press have made an agreement to collaborate on the creation of a new, publicly accessible corpus of contemporary British English. The corpus will be called BNC2014, and will have two sub-sections: Spoken BNC2014 and Written BNC2014. BNC2014 aims to be an updated version of BNC1994 which, despite its age, is still used as a proxy for present day English. This dissertation is an exploratory study for Written BNC2014. I aim to address several methodological issues which will arise in the construction of Written BNC2014: balance and representativeness, copyright, and e-language. These issues will be explored, and decisions will be reached about how these issues will be dealt with when construction of the corpus begins.

Róisín Knight

Constructing a corpus of children’s writing for researching creative writing assessment: Methodological issues

In my upcoming PhD project, I wish to explore applications of corpus stylistics to Key Stage 3 creative writing assessment in the UK secondary National Curriculum. In order to carry out this research, it is necessary to have access to a corpus of Key Stage 3 students’ writing that has been marked using the National Curriculum criteria. Prior to this MA project, no corpus fulfilled all of these criteria.

This dissertation explores the methodological issues surrounding the construction of such a corpus by achieving three aims. Firstly, all of the design decisions required to construct the corpus are made, and justified. These decisions relate to the three main aspects of the corpus construction: corpus design; transcription; metadata, textual markup and annotation. Secondly, the methodological problems relating to these design decisions are discussed. It is argued that, although several problems exist, the majority can be overcome or mitigated in some way. The impact of problems that cannot be overcome is fairly limited. Thirdly, these design decisions are implemented, through undertaking the construction of the corpus, so far as was possible within the limited time restraints of the project.

Gillian Smith

Using Corpus Methods to Identify Scaffolding in Special Education Needs (SEN) Classrooms

Much research addresses teaching methods in Special Education Needs (SEN) classrooms, where language interventions are vital in providing children with developmental language disorders with language and social skills. Research in this field, however, is often limited by its use of small-scale samples and manual analysis. This study aims to address this problem, through applying a corpus-based method to the study of one teaching method, scaffolding, in SEN classrooms. Not only does this provide a large and therefore more representative sample of language use in SEN classrooms, the main body of this dissertation attempts to clarify and demonstrate that corpus methods may be used to search for scaffolding features within the corpus. This study, therefore, presents a systematic and objective way of searching for the linguistic features of scaffolding, namely questions, predictions and repetitions, within a large body of data. In most cases, this was challenging, however, as definitions of features are vague in psychological and educational literature. Hence, I focus on first clarifying linguistic specifications of these features in teacher language, before identifying how these may be searched for within a corpus. This study demonstrates that corpus-based methods can provide new ways of assessing language use in the SEN classroom, allowing systematic, objective searches for teaching methods in a larger body of data.

Changing Climates and the Media: Lancaster workshop

climate change workshopThe Lancaster workshop on Changing Climates and the Media took place last Monday (21st Sep 2015).  This was a joint event organised by the ESRC Centre for Corpus Approaches to Social Science (CASS) and the Department of Sociology, Lancaster University.

The workshop brought together leading academics from a wide range of disciplines – sociology, media studies, political and environmental sciences, psychology, and linguistics – as well as community experts from the Environment Agency and the Green Alliance. The result was a lively debate on the interaction between the news media and the British society, and a critical reflection on people’s perception of the problem and effective ways to communicate the issue and promote changes in behaviour and practices.

Professor John Urry from Lancaster University opened the event with a brief overview of the major challenges posed by climate change. He also introduced the CASS project on Changing Climates, a corpus-based research on how climate change issues have been debated in the British and Brazilian news media in the past decade. This contrastive analysis is interesting for various reasons. These include striking differences related to public perception of the problem. While climate-change scepticism is prominent within the public debate in Britain, Brazil is a leading country in terms of concern about climate change, with nine-in-ten Brazilians considering global warming a very serious problem. Dr Carmen Dayrell presented some examples of fundamental differences between the media debate in these two countries. Unlike the British press, Brazilian newspapers articulate the discourse along the same lines as those advocated by the IPCC. This includes stressing the position of developed and developing nations and the projected consequences of the impact of climate change on the Earth’s system, such as the melting of polar icefields, loss of biodiversity and increased frequency of extreme weather events.

The Changing Climates project is currently being extended to Germany and Italy. Dr Marcus Müller from the Technische Universität Darmstadt discussed his preliminary findings on how the German news media has represented climate change issues. Dr M. Cristina Caimotto and Dr Osman Arrobbio from the University of Turin presented their initial observations of the Italian context and data. The Changing Climates presentation concluded with insightful comments by Dr Glenn Watts, the Environment Agency’s research lead on climate change and resource use and Lancaster’s primary partner in the Changing Climates project.

The afternoon session explored climate change from various perspectives. It started with Professor Reiner Grundmann from University of Nottingham who presented corpus research on the media coverage of climate change across Britain, Germany, France and the US. Dr James Painter from the University of Oxford and Dr Neil Gavin from the University of Liverpool focused on the coverage of the UN IPCC reports in the news media and television respectively.

The focus then turned to the British parliament and the 2009 debate on the Climate Change Bill. How do politicians talk about climate change in public? This question was addressed by Rebecca Willis, a PhD candidate at Lancaster University and a member of the Green Alliance. Following that, Dr Neil Simcock, also from Lancaster University, explored the representations of ‘essential’ energy use in the UK media. The session concluded with Professor Alison Anderson from Plymouth University’s talk on the role of local news media in communicating climate change issues.

Our sincere thanks to all participants of the Lancaster workshop for making it a unique and very special event. This was an excellent opportunity to exchange ideas and share experiences which we hope will foster enhanced collaboration between the various disciplines.


“Fleeing, Sneaking, Flooding” – The importance of language in the EU migrant crisis

With tensions over the current EU migrant crisis increasing, we at CASS thought it would be timely to highlight the importance of the language used in the debate about this humanitarian crisis. In this paper, by Paul Baker and Costas Gabrielatos, the authors analyse the construction of refugees and asylum seekers in UK press articles.
For readers who do not have access to Sage, you can find a final draft of the paper here free of charge. Please note that this version of the paper has the tables and figures at the end of the paper.


New CASS Briefing now available – Analysing narratives in the Corporate Financial Information Environment

cassnarrative-briefingAnalysing narratives in the Corporate Financial Information Environment. Transparent and effective communication between firms and the investment community is a key determinant of corporate success. Audited financial statements and associated narrative disclosures are among the main methods that firms use to communicate with investors and analysts. These disclosures combine with information from financial journalists and other market commentators to form the Corporate Financial Information Environment (CFIE). While a considerable body of work exists on financial narratives, research has been limited by the methods used for measuring the characteristics and quality of such disclosures. In particular, the need to hand-collect relevant data from firms’ annual reports and the subjectivity of textual scoring based on manual methods has restricted progress. Recent advances in computational and corpus linguistics provide a basis for undertaking more sophisticated analyses.

New resources are being added regularly to the new CASS: Briefings tab above, so check back soon.

The Spoken British National Corpus 2014 – project update

SpokenBNCupdateIt has been little over a year since CASS and Cambridge University Press announced a collaboration to compile a successor to the spoken component of the British National Corpus, the Spoken BNC2014. This will be the largest corpus of spoken British English since the original, with the advantage of being collected in the 2010s rather than the 1990s, providing an updated snapshot of spoken language in the UK. By including a set of recordings already gathered by Cambridge University Press before our collaboration began, we plan for the corpus to contain data ranging from the years 2012-2016. As well as being the year in which the project was announced, 2014 will be the median year of the planned data range, and so we chose it to feature in the working title of the project: the Spoken BNC2014.

Since our announcement, we have been hard at work: advertising the project nationally, collecting recordings from speakers from all over the UK, transcribing the data, conducting methodological investigations, and presenting our work so far at corpus linguistics conferences. At ICAME 36 in May we described the development of the Spoken BNC2014 transcription scheme, and at Corpus Linguistics 2015 in July we gave an overview of the data collection methodology as well as presenting new research on speaker identification in transcription. All of this activity continues as we work towards making the corpus freely and publicly available in the year 2017.

So far, we have gathered nearly 700 recordings at an estimated total of approximately six million words of informal conversational data. The majority of recordings feature two or three speakers, with about a quarter of recordings containing four or more so far. So far, the balance of speaker gender is fairly even, and we have been able to gather data from a wide range of ages – though at the moment the 19-29 year olds have a clear lead! We have done very well in England to gather recordings from a great range of self-reported dialects, and we plan now to focus more heavily on gathering recordings from Wales, Scotland, and Northern Ireland. The word cloud of self-reported conversation topics gives a first look at the range of things that users can expect to find being discussed in the corpus.

We are very pleased with the progress of the project so far, and we look forward to releasing the corpus texts publicly once they are complete. In the meantime, as announced at CL2015, we will be offering the opportunity to apply for pre-release data grants later this year. More information about the data grants will be announced in the near future.