User Involvement: CASS go to CLARIN PLUS workshop

At the beginning of June, I attended the CLARIN PLUS workshop on User Involvement held in the capital Helsinki. CLARIN stands for “Common Language Resources and Technology Infrastructure”; it is an international research infrastructure which provides scholars in the social sciences and humanities with easy access to digital language data, and also advanced tools to handle those data sets. The main purpose of the workshop was to share information, good practice, expertise, and ideas on how potential and current users can most benefit from CLARIN services.

I was representing Lancaster University as part of the UK branch of CLARIN, which is led by Martin Wynne at Oxford. Some of the participants, representing CLARIN’s different national consortia, shared their successful stories of their involvement with the local community.

At the workshop, Johanna Berg, from Sweden, and Mietta Lennes, from Finland showed us how they made innovative use of the roadshow event format to present some language resources across different institutions in their countries. Mietta also gave us a taste of the very useful tools and corpora that you can find at The Language Bank of Finland.

Another fruitful example presented at the workshop was the Helsinki Digital Humanities Hackathons. The event, which is in its third edition, brings together researchers from computer science, humanities and social sciences for a week of intensive work sharing a diversity of skills. Eetu Mäkelä, one of the organisers of the DHH, demonstrated that it is possible to engage researchers from very different backgrounds and have them working in a complementary way. The impressive results of last year’s edition can be checked out at the DHH16 website.

At the end of two profitable days, Darja Fišer, director of CLARIN-ERIC User Involvement, wrapped up the event by presenting other amazing experiences across several institutions connected to CLARIN. One of the success stories she mentioned was the Corpus Linguistics: Method, Analysis, Interpretation MOOC offered by CASS, which will be running again in Autumn this year (you can register your interest here!). Darja also highlighted the importance of events such as summer schools to reach out to more users. Indeed, Darja shared some incredible resources and insightful ideas at our recent Summer Schools in Corpus Linguistics and other Digital methods (#LancsSS17). Make sure you read our next blog post for a summary of the summer school week!

Spoken BNC2014 Symposium

On the afternoon of Monday 26th June, CASS hosted a special symposium to celebrate the upcoming public launch of the Spoken British National Corpus 2014 – a corpus which members of CASS and Cambridge University Press have spent the last three years compiling.

More than fifty guests attended, representing a mixture of Lancaster Summer Schools participants, members of the CASS Challenge Panel, and those who travelled to Lancaster just for the day.

To kick off the symposium, CASS Centre Director Andrew Hardie said a few words about the history of Corpus Linguistics at Lancaster University, and put the compilation of a new BNC into context against previous developments in the field. He expressed his delight at the interest in the Spoken BNC2014 project as evidenced by the number of guests who were in attendance for the symposium.

I then gave the first talk alongside Claire Dembry (from Cambridge University Press) and Andrew Hardie, as representatives of the Spoken BNC2014 research team which also includes Vaclav Brezina and Tony McEnery. We discussed the main methodological decisions we made when thinking about the design, data collection, transcription and processing of the corpus. Andrew then gave a quick demonstration of the corpus in CQPweb, showing how features including speaker IDs, overlaps and attribution confidence are displayed in the interface.

Following our talk came the first of four research presentations, all of which used (the early access subset of) the Spoken BNC2014. The first of these was a talk by Karin Aijmer (University of Gothenburg) about the intensifier fucking, which went down very well with the audience. Karin’s Spoken BNC2014 research, which also includes other intensifiers, will be published as a chapter in Brezina et al. (forthcoming).

After a short break for refreshments, Jacqueline Laws (University of Reading) presented research into verb-forming suffixation which she had undertaken with Chris Ryder and Sylvia Jaworska. Comparing the demographically-sampled component of the Spoken BNC1994 to the new Spoken BNC2014, she found that females now appear to produce more neologisms (e.g. favouritize, popify) compared to males. Laws et al.’s research will be published in a forthcoming special issue of the International Journal of Corpus Linguistics.

Susan Reichelt (Lancaster University) was next to present her work on producing sociolinguistically comparable subsets of both the original and new Spoken British National Corpora. She highlighted a point which I had touched upon in my earlier talk: that the compilation of the Spoken BNC2014 sought to strike a balance between direct comparability with the original corpus on the one hand, and methodological improvement on the other. The areas where improvement was favoured over comparability (e.g. the classification of speaker socio-economic status) ought to be considered especially when thinking about sociolinguistic analysis. Susan’s work is associated with the recently announced CASS SDA project.

Finally, Jonathan Culpeper and Mathew Gillings (Lancaster University) presented their work on politeness variation between the north and south of England. They aimed to assess the extent to which commonly held stereotypes about differences between northern and southern politeness were reflected in language use in both the original and new corpora as a single dataset. Their work will be published as a chapter in Brezina et al. (forthcoming).

My reaction as the organiser of the symposium was that there is definitely a sense of anticipation about the release of the Spoken BNC2014, which is planned to take place in the autumn. Furthermore it was lovely to meet so many friendly and enthusiastic attendees. I am very grateful to each of the speakers for giving such interesting talks, and to all who attended – especially those who tweeted their reactions to the talks using the #BNC2014 hashtag! As one of my final duties as a member of CASS before moving onto pastures new, I am very glad that the symposium went as well as it did.

CASS goes to the Wellcome Trust!

Earlier this month I represented CASS in a workshop, hosted by the Wellcome Trust, which was designed to explore the language surrounding patient data. The remit of this workshop was to report back to the Trust on what might be the best ways to communicate to patients about their data, their rights respecting their data, and issues surrounding privacy and anonymity. The workshop comprised nine participants who all communicated with the public as part of their jobs, including journalists, bloggers, a speech writer, a poet, and a linguist (no prizes for guessing who the latter was…). On a personal note, I had prepared for this event from the perspective of a researcher of health communication. However, the backgrounds of the other participants meant that I realised very quickly that my role in this event would not be so specific, so niche, but was instead much broader, as “the linguist” or even “the academic”.

Our remit was to come up with a vocabulary for communication about patient data that would be easier for patients to understand. As it turned out, this wasn’t too difficult, since most of the language surrounding patient data is waffly at its best, and overly-technical and incomprehensible at its worst. One of the most notable recommendations we made concerned the phrase ‘patient data’ itself, which we thought might carry connotations of science and research, and perhaps disengage the public, and so recommended that the phrase ‘patient health information’ might sound less technical and more 14876085_10154608287875070_1645281813_otransparent. We undertook a series of tasks which ranged from sticking post-it notes on whiteboards and windows, to role play exercises and editing official documents and newspaper articles. What struck me, and what the diversity of these tasks demonstrated particularly well, was how the suitability of our suggested terms could only really be assessed once we took the words off the post-it notes and inserted them into real-life communicative situations, such as medical consultations, patient information leaflets, newspaper articles, and even talk shows.

The most powerful message I took away from the workshop was that close consideration of linguistic choices in the rhetoric surrounding health is vital for health care providers to improve the ways that they communicate with the public. To this end, as a collection of methods that facilitate the analysis of large amounts of authentic language data in and across a variety of texts and contexts, corpus linguistics has an important role to play in providing such knowledge in the future. Corpus linguistic studies of health-related communication are currently small in number, but continue to grow apace. Although the health-related research that is being undertaken within CASS, such as Beyond the Checkbox and Metaphor in End of Life Care, go some way to showcasing the rich fruits that corpus-based studies of health communication can bear, there is still a long way to go. In particular, future projects in this area should strive to engage consumers of health research not only in terms of our findings, but also the (corpus) methods that we have used to get there.

Upcoming CASS Psycholinguistics Seminar

CASS is excited to announce an upcoming half-day research seminar on the theme of “Corpus Data and Psycholinguistics”. The event will take place on Thursday 19th May 2016 at 1-5pm in Furness Lecture Theatre 3.

The aim of the event is to bring together researchers with an interest in combining methods from corpus linguistics and psycholinguistics. In particular, there will be a focus on experimental psycholinguistics. It is set to be an exciting afternoon consisting of four 40-minute presentations from both internal and external speakers. Professor Padraic Monaghan from the Department of Psychology will be giving an introduction to computational modelling in psycholinguistics, and I will be presenting my work on investigating the processing of collocation using EEG. Furthermore, Dr Phil Durrant from the University of Exeter will be giving a talk entitled “Revisiting collocational priming”, and Professor Michaela Mahlberg from the University of Birmingham will be discussing the methodological issues associated with combining eye-tracking techniques with corpus data.

You can find out more about these talks from the abstracts below.

Padraic Monaghan, Lancaster University

Computational modelling of corpus data in psycholinguistic studies

Computational models of language learning and processing enable us to determine the inherent structure present in language input, and also the cognitive mechanisms that react to this structure. I will give an introduction to computational models used in psycholinguistic studies, with a particular focus on connectionist models where the structure of processing is derived principally from the structure of the input to the model.

Phil Durrant, University of Exeter

Revisiting collocational priming

Durrant & Doherty (2010) evaluated whether collocations at different levels of frequency exhibit psycholinguistic priming. It also attempted to untangle collocation from the related phenomenon of psychological association by comparing collocations which were and were not associates. Priming was found between high-frequency collocations but associated collocates appeared to exhibit more deep-rooted priming (as reflected in a task designed to reflect automatic, rather than strategic processes) than those which were not associated. This presentation will critically review the 2010 paper in light of more recent work. It will re-evaluate the study itself and suggest ways in which research could be taken forward.

Durrant, P., & Doherty, A. (2010). Are high-frequency collocations psychologically real? Investigating the thesis of collocational priming. Corpus linguistics and linguistic theory, 6(2), 125-155.

Jennifer Hughes, Lancaster University

Investigating the processing of collocation using EEG: A pilot study

In this presentation, I discuss the results of an EEG experiment which pilots a procedure for determining whether or not there is a quantitively distinct brain response to the processing of collocational bigrams compared to non-collocational bigrams. Collocational bigrams are defined as adjacent word pairs which have a high forward transitional probability in the BNC (e.g. crucial point), while non-collocational bigrams are defined as adjacent word pairs which are semantically plausible but are absent from the BNC (e.g. crucial night). The results show that there is a neurophysiological difference in how collocational bigrams and non-collocations bigrams are processed.

Michaela Mahlberg, Kathy Conklin, and Gareth Carrol, University of Birmingham

Exploring corpus-attested patterns in Dickens’s fiction – methodological challenges of using eye-tracking techniques

The study of the relationship between patterns and meanings is a key concern in corpus linguistics. The data that corpus linguists work with, however, only provides a partial picture. In this paper, we will look at how questions of frequencies in corpora can be related to questions raised by data from eye-tracking studies on reading times. We will also discuss challenges of designing experiments to address these questions. As a case study, we focus on examples of patterns identified in Dickens’s fiction, but the methodological issues we address have wider implications beyond the study of literary corpora.

The event is free to attend and is open to both internal and external attendees. If you are an external guest, please email this parenthesis with the @ sign) so we know that you intend to come.

We are really looking forward to this event as it will be an exciting opportunity to share ideas regarding the different approaches to using corpus data in experimental psycholinguistics.

FireAnt Launch Event

We will be running a launch event and workshop for a new software tool that we have created called FireAnt. The event and workshop will be held from 13:00 to 17:00 on Monday 22nd February 2016 here at Lancaster University.

FireAnt was created by Laurence Anthony as part of the 2015 ESRC-funded CASS-affiliated DOOM project on social media analysis. FireAnt is a free and easy-to-use tool designed to help corpus linguists and social scientists analyze Twitter and other social network data without the need for programming or database management skills. The following features of the tool will be explored in this workshop:

  • import different formats of data (e.g. Twitter data in JSON format, Reddit data in CSV format, etc.)
  • search that data and its associated metadata in a variety of ways (e.g., retrieve all tweets containing #blacklivesmatter sent in December 2015)
  • export the results to other formats including a plain text file for “standard” corpus analysis, an Excel/CSV file for statistical analysis, a timeline chart, and a network graph

We will be providing lunch at the start of the event and all materials for the workshop (including the software and help guide) on a USB drive. The schedule for the day can be found below.


Time Agenda
1315-1415 PDR Room: Lunch
1415-1430 Introduction, log on, etc.
1430-1530 FireAnt basics
1530-1545 Refuel: Coffee break
1545-1645 FireAnt advanced
1645-1700 Q&As, requests, bouquets, encores

Please note that places are extremely limited and must be booked in advance. If you would like to attend, please email Claire Hardaker (c.hardaker(Replace this parenthesis with the @ sign) in the first instance.

Registration open for free upcoming event: “Language matters: communication, culture and society”

CASS is excited to announce an upcoming event at the International Anthony Burgess Foundation in Manchester on Thursday 12th November from 4pm-9pm.

“Language matters: communication, culture and society” is a mini-series of four informal talks showcasing the impact of language on society. The timely themes will be presented in an approachable manner that will be accessible to a general audience, stimulating to novice language researchers, and interesting to social scientists. Topics include hate speech, myths about impoliteness, and online aggression. Each talk incorporates an element of social science research beyond linguistics and we will take this opportunity to emphasise the importance of interdisciplinary work.

Afterwards, the audience will be invited to a drinks reception, during which they will have the opportunity to engage further with speakers and to network with guests.

In a single event, participants will have the opportunity to hear renowned scholars talk about their lives, their work, and what they find most interesting about the relationship between language and society. Talks are short, energetic, and pitched for a general audience.


  • “Impoliteness: The language of offence” – Jonathan Culpeper
  • “Vile Words. What is the case for criminalizing everyday hate speech as hate crime?” – Paul Iganski
  • “The ethics of investigating online aggression: where does ‘virtual’ end and ‘reality’ begin?” – Claire Hardaker
  • “Spoken English in UK society” – Robbie Love

This free event is part of the ESRC Festival of Social Science 2015. Please register online to book your place.

For a taste of what’s in store, please see this video recap of a similar event held in London last year. For more information, please visit the ESRC website.

Call for Participation: ESRC Summer School in Corpus Approaches to Social Science

The ESRC Summer School in Corpus Approaches to Social Sciences was inaugurated in 2013; the 2014 event is the second in the series. It will take place 15th to 18th July 2014, at Lancaster University, UK.

This free-to-attend summer school takes place under the aegis of CASS (, an ESRC research centre bringing a new method in the study of language – the corpus approach – to a range of social sciences. CASS is investigating the use and manipulation of language in society in a host of areas of pressing concern, including climate change, hate crime and education.

Who can attend?

A crucial part of the CASS remit is to provide researchers across the social sciences with the skills needed to apply the tools and techniques of corpus linguistics to the research questions that matter in their own discipline. This event is aimed at junior social scientists – especially PhD students and postdoctoral researchers – in any of the social science disciplines. Anyone with an interest in the analysis of social issues via text and discourse – especially on a large scale – will find this summer school of interest.


The programme consists of a series of intensive two-hour sessions, some involving practical work, others more discussion-oriented.

Topics include: Introduction to corpus linguistics; Corpus tools and techniques; Collecting corpus data; Foundational techniques for social science data – keywords and collocation; Understanding statistics for corpus analysis; Discourse analysis for the social sciences; Semantic annotation and key domains; Corpus-based approaches to metaphor in discourse; Pragmatics, politeness and impoliteness in the corpus.

Speakers include Tony McEnery, Paul Baker, Jonathan Culpeper, and Elena Semino.

The CASS Summer School is one of the three co-located Lancaster Summer Schools in Interdisciplinary Digital Methods; see the website for further information:

How to apply

The CASS Summer School is free to attend, but registration in advance is compulsory, as places are limited.

The deadline for registrations is Sunday 8th June 2014.

The application form is available on the event website as is further information on the programme.


Dispatch from YLMP2014


I recently had the pleasure of travelling to Poland to attend the Young Linguists’ Meeting in Poznań (YLMP), a congress for young linguists who are interested in interdisciplinary research and stepping beyond the realm of traditional linguistic study. Hosted over three days by the Faculty of English at Adam Mickiewicz University, the congress featured over 100 talks by linguists young and old, including plenary lectures by Lancaster’s very own Paul Baker and Jane Sunderland. I was one of three Lancaster students to attend the congress, along with undergraduate Agnes Szafranski and fellow MA student Charis Yang Zhang.

What struck me about the congress, aside from the warm hospitality of the organisers, was the sheer breadth of topics that were covered over the weekend. All of the presenters were more than qualified to describe their work as linguistics, but perhaps for the first time I saw within just how many domains such a discipline can be applied. At least four sessions ran in parallel at any given time, and themes ranged from gender and sexuality to EFL and even psycholinguistics. There were optional workshops as well as six plenary talks. On the second day of the conference, as part of the language and society stream, I presented a corpus-assisted critical discourse analysis of the UK national press reporting of the immediate aftermath of the May 2013 murder of soldier Lee Rigby. I was happy to have a lively and engaged audience who had some really interesting questions for me at the end, and I enjoyed the conversations that followed this at the reception in the evening!

What was most encouraging about the congress was the drive and enthusiasm shared by all of the ‘young linguists’ in attendance. I now feel part of a generation of young minds who are hungry to improve not only our own work but hopefully, in time, the field(s) of linguistics as a whole. After my fantastic experience at the Boya Forum at Beijing Foreign Studies University last autumn, I was happy to spend time again celebrating the work of undergraduate and postgraduate students, and early-career linguists. There was a willingness to listen, to share ideas, and to (constructively) criticise where appropriate, and as a result I left Poznań feeling very optimistic about the future of linguistic study. I look forward to returning to the next edition of YLMP, because from what I saw at this one, there is a new generation of linguists eager to push the investigation of language to the next level.

Rude Britannia – what our politeness says about our nation

Britain is still a nation of polite people and fears that texts, tweets and Facebook are making people ruder is a myth, according to research from Lancaster University’s Faculty of Arts and Social Sciences (FASS). The British are famous for their reserve, indirect way of saying things and a love of queuing. However, new research shows that what we find polite, and what we find rude is unique to our culture and can be very different to notions of rudeness in other cultures.

The research carried out by Professor Jonathan Culpeper, an expert in linguistic politeness, will be presented at an event as part of the Economic and Social Research Council’s annual Festival of Social Science, which runs between 2-9 November 2013.

Read more…

Notes from the 3rd annual Boya Forum 2013 Undergraduate Conference

If, six months ago, you had told me that an assignment I was writing during my undergraduate degree would eventually send me to China for the weekend, I wouldn’t have believed you. However, that is exactly what I found myself doing last weekend, when I travelled to Beijing Foreign Studies University to present at the 3rd annual Boya Forum 2013 undergraduate conference. I was one of two students from Lancaster University sent there to present at the event, which aimed to celebrate the undergraduate research abilities of students in the areas of English literature, translation studies, media and communication studies, cultural studies, international and area studies and, most relevant to my work, language studies. The participants represented a total of 27 universities, and coming from Lancaster I was from one of only three universities from outside of China; the others being Columbia University in New York and Rollins College in Florida.

The conference ran four concurrent panels of talks at any given time, meaning that in just one day we produced a total of 70 individual presentations. It was an intense day of talks and discussions that ran from the early morning right through into the evening, and my talk was right at the end of the day so I knew I would have a job of trying to keep my audience’s attention. I presented a corpus-based critical discourse analysis of a Parliament debate about the Marriage (Same Sex Couples) Bill, which seems to have been my party trick over the summer (I gave a poster of this at the Corpus Linguistics 2013 conference in July and presented about it at a PhD course in Copenhagen in August). Afterwards I was posed some really interesting questions about my work from both the professor who acted as “commentator” for the session and from other students in attendance. It was a great opportunity to reflect on my work and think about what I might do differently the next time I do a similar piece of analysis. It was also really great to see four or five other presentations from Chinese students who had used corpus-based techniques in their research, and to be able to discuss how our approaches differ.

At the end of the day there was a closing ceremony where the professors from BFSU awarded prizes for the best presentations of the conference, based on the ratings of the commentators from each panel. I was very happy to be one of nine recipients of a “First Prize for Best Presentation” award and an official BFSU jacket to match. I wore it proudly on the journey back to Lancaster.

The organisers of the Boya Forum 2013 undergraduate conference should be proud of what they are doing. As a recently graduated BA student I completely agree that the research potential of undergraduate students, particularly in arts and social science-based disciplines, should be valued and celebrated more. Events like this are a brilliant way of showing undergraduate students that their work is valued beyond the difference between a first and a 2:1. This was the first year of the conference’s short history that students from outside of China had contributed to the event, and it was great to hear that the organisers hope to invite an even wider international presence next year. Though, unfortunately, I will no longer qualify to present at next time, I look forward to hearing about more undergraduate students from Lancaster and elsewhere travelling to Beijing to present at Boya Forum 2014. It certainly was a fantastic experience, and I am extremely grateful to CASS and BFSU for jointly funding my visit.