Participants needed for EEG experiment!

For my PhD I am trying to find out how language is processed in the brain by combining methods from corpus linguistics and psycholinguistics. Specifically, I have extracted real language data from the British National Corpus and modified this data so that it can be presented to participants in an electroencephalography (EEG) experiment. In EEG experiments, electrodes are placed on a participant’s head and these electrodes detect some of the electrical activity that occurs in the participant’s brain in response to particular stimuli. EEG experiments are frequently conducted in Lancaster’s Psychology Department but they have not yet been conducted in the Department of Linguistics and English Language, so it’s really exciting to try out this method which is new to the department.

When conducting an EEG experiment, I start by taking head measurements and then placing a headcap on the participant’s head. This headcap contains 64 electrode holders which I fill with conductive gel before placing an electrode into each one. I also attach some additional electrodes behind the ears and around the eyes. Once all of the electrodes are in place, the stimuli is displayed to the participant on a computer screen. This stimuli consists of sentences that are presented word-by-word, as well as true/false statements that are presented as whole sentences. Participants just need to read the word-by-word sentences and respond to the true/false statement by pressing either the ‘T’ or the ‘F’ key on the keyboard. While they’re doing this, the electrodes detect some of the electrical activity that is happening in the brain, and this information is sent to another computer which displays the electrical activity as a continuous waveform. The setup of the experiment can be seen in the diagram below.

Jen experiment






Throughout my PhD I will be conducting a series of experiments starting with a pilot study. In my pilot study, the experiment itself lasts for just 10 minutes but it can take me up to an hour to attach all of the electrodes. This preparation time should decrease as I carry it out on more and more participants.

I have already conducted several practice runs of my experiment with other postgraduate students. For example, Gillian Smith, another PhD research student in CASS, agreed to take part in one of my practice runs and here she describes her experience as a participant:

Jen experiment Gill


“Getting to be involved in Jen’s experiment was a great opportunity! Having never participated in such a study before, I found the whole process (which Jen explained extremely well) very interesting. I particularly enjoyed being able to look at my brainwaves after, which is something I have never experienced. Likewise, having electrodes on my head was a lovely novelty.”



I am currently looking for 15 native speakers of English to take part in my pilot study.

If you are interested in taking part in this experiment please email this parenthesis with the @ sign) for more information.

Does it matter what pronoun you use?

Historically, in British English at least, if you didn’t know someone’s preferred gender it was considered grammatically correct to use he to refer to them, even if they might be female. Based on the justification that ‘the masculine includes the feminine’, this means that all of the following would be considered fine examples of English usage:

  • The driver in front is swerving like he is drunk.
  • A scientist is a fountain of knowledge; he should be respected.
  • Any student wishing to answer a question should raise his hand.
  • Everyone should consider his own family when choosing how to vote.

When you picture the people referred to in these scenarios, were any of them women? Or, to put it another way, were any of them any identity other than ‘male’? Evidence from psychological experiments has shown that the pronoun he (in all its forms) evokes a male image in the mind. Its use as a ‘generic’ pronoun, in contrast to what grammarians of old seemed to think, actually makes it harder to read and process sentences with stereotypically feminine referents (i.e. A childminder must wash his hands before feeding the children.).

So if you don’t want to go around assuming that all the world is male by default, what do you do? Luckily, there is a solution to this problem: if you don’t know a person’s gender identity, you can use the pronoun they to refer to them. There may be a mental screech of brakes here for those of you who were taught that they is a plural pronoun, but actually, it’s more versatile than that. Try using they for he in all of the sentences above. When thinking about the scientist or the driver, was there suddenly more than one? No. Indeed, singular they has been shown not to interfere with mental processing in the way that generic he does.  I used it in the first sentence of this post and I’ll bet you didn’t even notice it. (Go on. Check.)

For those of you still not convinced, the use of singular they is widespread in spoken and written English. It’s highly likely that you use the form yourself without even thinking about it. In British Pronoun Use, Prescription and Processing (Palgrave 2014) an analysis of this type of pronouns demonstrates that singular they is ubiquitous in British English. If you still need more convincing, here’s a link to an extremely favourable review of that study just published in Language and Society.

MA students all pass with Distinction!

Myself, Róisín, and Gillian were delighted to find out last week that we all passed our MA Language and Linguistics degrees with Distinction. Our degree programme included taking a wide range of modules, followed by two terms spent researching and writing a 25,000 word dissertation. All three of us used this opportunity to conduct pilot or exploratory studies in preparation for our PhD studies, which we are excited to be commencing now! You can see the titles and abstracts of our dissertations below:

Abi Hawtin

Methodological issues in the compilation of written corpora: an exploratory study for Written BNC2014

The Centre for Corpus Approaches to Social Science (CASS) at Lancaster University and Cambridge University Press have made an agreement to collaborate on the creation of a new, publicly accessible corpus of contemporary British English. The corpus will be called BNC2014, and will have two sub-sections: Spoken BNC2014 and Written BNC2014. BNC2014 aims to be an updated version of BNC1994 which, despite its age, is still used as a proxy for present day English. This dissertation is an exploratory study for Written BNC2014. I aim to address several methodological issues which will arise in the construction of Written BNC2014: balance and representativeness, copyright, and e-language. These issues will be explored, and decisions will be reached about how these issues will be dealt with when construction of the corpus begins.

Róisín Knight

Constructing a corpus of children’s writing for researching creative writing assessment: Methodological issues

In my upcoming PhD project, I wish to explore applications of corpus stylistics to Key Stage 3 creative writing assessment in the UK secondary National Curriculum. In order to carry out this research, it is necessary to have access to a corpus of Key Stage 3 students’ writing that has been marked using the National Curriculum criteria. Prior to this MA project, no corpus fulfilled all of these criteria.

This dissertation explores the methodological issues surrounding the construction of such a corpus by achieving three aims. Firstly, all of the design decisions required to construct the corpus are made, and justified. These decisions relate to the three main aspects of the corpus construction: corpus design; transcription; metadata, textual markup and annotation. Secondly, the methodological problems relating to these design decisions are discussed. It is argued that, although several problems exist, the majority can be overcome or mitigated in some way. The impact of problems that cannot be overcome is fairly limited. Thirdly, these design decisions are implemented, through undertaking the construction of the corpus, so far as was possible within the limited time restraints of the project.

Gillian Smith

Using Corpus Methods to Identify Scaffolding in Special Education Needs (SEN) Classrooms

Much research addresses teaching methods in Special Education Needs (SEN) classrooms, where language interventions are vital in providing children with developmental language disorders with language and social skills. Research in this field, however, is often limited by its use of small-scale samples and manual analysis. This study aims to address this problem, through applying a corpus-based method to the study of one teaching method, scaffolding, in SEN classrooms. Not only does this provide a large and therefore more representative sample of language use in SEN classrooms, the main body of this dissertation attempts to clarify and demonstrate that corpus methods may be used to search for scaffolding features within the corpus. This study, therefore, presents a systematic and objective way of searching for the linguistic features of scaffolding, namely questions, predictions and repetitions, within a large body of data. In most cases, this was challenging, however, as definitions of features are vague in psychological and educational literature. Hence, I focus on first clarifying linguistic specifications of these features in teacher language, before identifying how these may be searched for within a corpus. This study demonstrates that corpus-based methods can provide new ways of assessing language use in the SEN classroom, allowing systematic, objective searches for teaching methods in a larger body of data.

Changing Climates and the Media: Lancaster workshop

climate change workshopThe Lancaster workshop on Changing Climates and the Media took place last Monday (21st Sep 2015).  This was a joint event organised by the ESRC Centre for Corpus Approaches to Social Science (CASS) and the Department of Sociology, Lancaster University.

The workshop brought together leading academics from a wide range of disciplines – sociology, media studies, political and environmental sciences, psychology, and linguistics – as well as community experts from the Environment Agency and the Green Alliance. The result was a lively debate on the interaction between the news media and the British society, and a critical reflection on people’s perception of the problem and effective ways to communicate the issue and promote changes in behaviour and practices.

Professor John Urry from Lancaster University opened the event with a brief overview of the major challenges posed by climate change. He also introduced the CASS project on Changing Climates, a corpus-based research on how climate change issues have been debated in the British and Brazilian news media in the past decade. This contrastive analysis is interesting for various reasons. These include striking differences related to public perception of the problem. While climate-change scepticism is prominent within the public debate in Britain, Brazil is a leading country in terms of concern about climate change, with nine-in-ten Brazilians considering global warming a very serious problem. Dr Carmen Dayrell presented some examples of fundamental differences between the media debate in these two countries. Unlike the British press, Brazilian newspapers articulate the discourse along the same lines as those advocated by the IPCC. This includes stressing the position of developed and developing nations and the projected consequences of the impact of climate change on the Earth’s system, such as the melting of polar icefields, loss of biodiversity and increased frequency of extreme weather events.

The Changing Climates project is currently being extended to Germany and Italy. Dr Marcus Müller from the Technische Universität Darmstadt discussed his preliminary findings on how the German news media has represented climate change issues. Dr M. Cristina Caimotto and Dr Osman Arrobbio from the University of Turin presented their initial observations of the Italian context and data. The Changing Climates presentation concluded with insightful comments by Dr Glenn Watts, the Environment Agency’s research lead on climate change and resource use and Lancaster’s primary partner in the Changing Climates project.

The afternoon session explored climate change from various perspectives. It started with Professor Reiner Grundmann from University of Nottingham who presented corpus research on the media coverage of climate change across Britain, Germany, France and the US. Dr James Painter from the University of Oxford and Dr Neil Gavin from the University of Liverpool focused on the coverage of the UN IPCC reports in the news media and television respectively.

The focus then turned to the British parliament and the 2009 debate on the Climate Change Bill. How do politicians talk about climate change in public? This question was addressed by Rebecca Willis, a PhD candidate at Lancaster University and a member of the Green Alliance. Following that, Dr Neil Simcock, also from Lancaster University, explored the representations of ‘essential’ energy use in the UK media. The session concluded with Professor Alison Anderson from Plymouth University’s talk on the role of local news media in communicating climate change issues.

Our sincere thanks to all participants of the Lancaster workshop for making it a unique and very special event. This was an excellent opportunity to exchange ideas and share experiences which we hope will foster enhanced collaboration between the various disciplines.


25th Anniversary Conference for the Muslim News

muslimnews0I was honoured to attend the 25th Anniversary Conference for the Muslim News on the 15th September. The event was organized by the Society of Editors and the Daily Telegraph had provided the venue – the spectacular Merchant Taylor’s Hall in the City of London. The event began with a speech by the Bob Satchwell, Executive Director of the Society of Editors and a welcoming speech by Lord Black of the Telegraph Media Group. Following that, Fatima Manji of Channel 4 News introduced me and I gave the morning’s keynote speech discussing the work which I did with Paul Baker and Costas Gabrielatos (Discourse Analysis and Media Attitudes, The Representation of Islam in the British Press) looking at the representation of Islam and Muslims in the UK press. I was also very happy to be able to present some early findings from a follow up study Paul Baker and I are currently doing, supported by CASS and the Muslim NGO MEND, looking at how things have developed since our work was published. This is based on approximately 80 million words more data composed of all UK national newspapers articles mentioning Muslims and Islam in the period 2010-2015.

The audience included a mixture of journalists, newspaper editors and TV news reporters and editors. In addition there were representatives from many faith groups and NGOs present too. The research was very well received by the audience. After the talk a panel was convened to discuss the work and take questions from the audience. The panel included John Wellington, the managing editor of the Mail on Sunday, Doug Wills, managing editor of the London Evening Standard and the Independent group of newspapers and Sue Ryan, former managing editor of the Daily Telegraph and manager of the trainee programme for the Mail group. It was a real privilege to be able to discuss our work with them and I found them to be open to criticism and ready to consider change. One point that emerged from the discussion that was of interest, I thought, was that the press are often criticized for their use of language when that usage is current in general English. While this puts the press in the spotlight, it also means that at times they can be in the vanguard of discussion and change in language use, as the recent discussion of the use of the word ‘migrant’ in the UK media has shown. This makes an engagement with media language all the more important for academic researchers.

Following this panel was a second panel, chaired by Fatima Manji, composed of the editors of ITN news and BBC news (Robin Elias and James Stephenson) as well as Channel 4’s Home Affairs correspondent Simon Israel. Julian Petley, author of Pointing the finger: Islam and Muslims in the British media, gave academic weight to this panel’s discussion. A very thought provoking discussion ensued about how to achieve a more inclusive and representative newsroom which demonstrated, once again, that the media was willing to engage in discussion and was prepared to embrace change.

After lunch the final session, chaired by Ehsan Masood of Research Fortnight, followed a
contribution from Jonathan Heywood of Impress on a Leveson compliant media watchdog that Impress are developing. A lively debate followed led by the head of IPSO, Sir Alan Moses. Sir Alan was joined by prominent editors from The Sunday Times (Eleanor Mills) and The Observer (Stephen Pritchard) as well as the Managing Editor of the London Evening Standard and Independent Group, Will Gore. A key tension that was highlighted by Sir Alan Moses in the debate was between what in principle may be desirable and what is achievable in reality. He also made the important point that we have to decide as a society where we want regulation to end and a softer form of social regulation to begin. I finished the afternoon with a brief and rewarding discussion of my work with Sir Alan.

The event was a rare and precious opportunity to showcase academic research to a range of key stakeholders and for that opportunity I am very grateful both to MEND and to Muslim News.

CL2015 – Presenting for the First Time at an International Conference

In July 2015 I was lucky enough to give a presentation at the Corpus Linguistics 2015 conference at Lancaster University. This was my first time presenting at an international conference, and I was nervous but very excited. I thought I would use this blog post to elaborate on my experience of presenting at a conference for the first time, and hopefully give some advice to people who may be worrying about giving their first conference presentation (or to see how my experience compares to those of you who are already well practiced at this)!

All the way back in January 2015 I put together my abstract to submit to the conference. This was quite a tricky process as the abstracts for CL2015 were required to be 750-1500 words in length. This meant that more than a simple summary was needed, but that I also couldn’t go into a great amount of detail about my method or results. After many re-drafts I managed to find a balance between the two, and with crossed fingers and toes I submitted my abstract. Crossing my fingers must have worked (or maybe it was all the re-drafting…) because I was delighted to find out that I had been accepted to present at the conference! The feedback from the reviewers was mostly positive, but, even when reviewers suggest lots of changes, it’s important to see this as a way to make your work even better rather than as negative feedback.

After the elation of being accepted had worn off, I had a sudden realisation of “Oh my God, I actually have to stand up and talk about corpus linguistics in front of a whole room of actual professional corpus linguists!” However, after lots of practice in front of my PhD supervisors and fellow students who would be presenting at the conference I began to feel more confident. That was until the first day of the conference arrived and I found out that I would be presenting in one of the biggest lecture theatres in the university!

After a few moments of worry about whether anyone would be able to hear me, or whether anyone would even come, I thought “Well, there’s no point being nervous, you’ve practiced as much as you can, let’s just enjoy it!” And, as is usually the case when you’ve spent a long time worrying about everything that could go wrong, everything went absolutely fine. I had a good sized audience, my presentation worked, and I managed to answer all of the questions put to me. Something I found very helpful whilst presenting was to have a set of cue cards with very short bullet point notes on for each slide – I barely looked at them, but it was reassuring to know that they were there in case I completely froze up! The only thing that didn’t go quite to plan was my timing; I was a couple of minutes short of the allotted 20 minutes for presenting. However, over the course of the conference I learnt that this is vastly preferable to being over the time limit. Giving a presentation which is too long makes you seem unrehearsed and leaves you with no time for questions or comments. It can also ruin the timings for all of the other presenters following you, so make sure you rehearse with a stopwatch beforehand!

I received some lovely feedback after the presentation both in person and on Twitter. This allowed me to meet lots of other people at the conference with similar research interests to mine, and gave me lots of ideas for future research.

Overall, presenting at CL2015 was a very enjoyable and extremely valuable experience. It taught me that, with the right amount of preparation, giving a presentation to experts in your field is not something to worry about, but rather an opportunity to showcase your work and help it progress. My top tips for those of you worrying about presenting at a conference would be:

1) Don’t rush your abstract, you won’t get the chance to worry about presenting if your abstract doesn’t showcase why your work is important and interesting.

2) Practice with friends, colleagues, anyone who will listen! And time yourself with a stopwatch – you don’t want to be the one that the chair has to use the scary ‘STOP TALKING NOW’ sign on!

3) Use cue cards if it makes you feel more confident. However, DON’T write a script – this will make you seem over-rehearsed and you won’t be as interesting to listen to.

4) Put your Twitter handle on your presentation slides so that you can network and people can give you feedback online as well as in person.

5) See presenting as a valuable chance to have your work evaluated by experts in your field, and enjoy it!

Do my experiences of presenting at a conference for the first time match yours? Have you found these tips helpful? Let us know @corpussocialsci!

Sino-UK Corpus Linguistics Summer School

ShanghaiAt the end of July, Tony McEnery and I taught at the second Sino-UK corpus linguistics summer school, arranged between CASS and Shanghai Jiao Tong University. It was my first time visiting China and we arrived during an especially warm season with temperatures hitting 40 degrees Celsius (we were grateful for the air conditioning in the room we taught in).

Tony opened the summer school, giving an introductory session on corpus linguistics, followed a few days later by a session on collocations, where he introduced CASS’s new tool for collocational networks, GraphColl. I gave a session on frequency and keywords, followed by later sessions on corpus linguistics and language teaching, and CL and discourse analysis. For the lab work components of our sessions, we didn’t use a computer lab. Instead the students brought along their own laptop and tablets, including a few who carried out BNCweb searches on their mobile phones! I was impressed by how much the students attending already knew, and had to think on my feet a couple of times – particularly when asked to explain some of the more arcane aspects of WordSmith (such as the “Standardised Type Token ratio standard deviation”).

At the end of the summer school, a symposium was held where Tony gave a talk on his work with Dana Gablasova and Vaclav Brezina on the Trinity Learner Language corpus. I talked about some research I’m currently doing with Amanda Potts on change and variation in British and American English.

Also presenting were Prof Gu Yuego (Beijing Foreign Studies University) who talked about building a corpus of texts on Chinese medicine, and Prof. Roger K Moore (University of Sheffield) who discussed adaptive speech recognition in noisy contexts.

We were made to feel very welcome by our host, Gavin Zhen, one of the lecturers at the university, who went out of his way to shuttle us on the 90 minute journey from the university to our hotel on the Bund.

It was a great event and it was nice to see students getting to grips with corpus linguistics so enthusiastically.

The heart of the matter …

TLC-LogoHow wonderful it is to get to the inner workings of the creature you helped bring to life! I’ve just spent a week with the wonderful – and superbly helpful – team at CASS devoting time to matters on the Trinity Lancaster Spoken Corpus.

Normally I work from London situated in the very 21st century environment of the web – I plan, discuss and investigate the corpus across the ether with my colleagues in Lancaster. They regularly visit us with updates but the whole ‘system’ – our raison d’etre if you like – sits inside a computer. This, of course, does make for very modern research and allows a much wider circle of access and collaboration. But there is nothing like sitting in the same room as colleagues, especially over the period of a few days, to test ideas, to leap connections and to get the neural pathways really firing.


It’s been a stimulating week not least because we started with the wonderful GraphColl, a new collocation tool which allows the corpus to come to life before our eyes. As the ‘bubbles’ of lexis chase across the screen searching for their partners, they pulse and bounce. Touching one of them lights up more collocations, revealing the mystery of communication. Getting the number right turns out to be critical in producing meaningful data that we can actually read – too loose and we end up with a density we cannot untangle; the less the better seems to be the key.  It did occur to me that finally language had produced something that could contribute to the Science Picture Library where GraphColl images could complement the shots of language activity in the brain. I’ve been experimenting with it this week – digging out question words from part of the corpus to find out how patterned they are – more to come.

We’ve also been able to put more flesh on the bones of an important project developed by Vaclav Brezina – how to make the corpus meaningful for teachers (and students). Although we live in an era where the public benefit of science is rightly foregrounded, it can be hard sometimes to ‘translate’ the science and complexity of the supporting technology so that it is of real value to the very people who created the corpus. Vaclav has been preparing a series of extracts of corpus data that can come full circle back into the classroom by showing teachers and their students the way that language works – not in the textbooks but in real ‘lingua franca’ life. In other words, demonstrating the language that successful learners use to communicate in global contexts. This is going to be turned into a series of teaching materials with the quality and relevance being assured by crowdsourcing teaching activities from the teachers themselves.

time Collocates of time in the GESE interactive task

Meanwhile I am impressed by how far the corpus – this big data – is able to support Trinity by helping to build robust validity arguments for the GESE test.  This is critical in helping Trinity’s core audience – our test takers –  to understand why should I do this test, what will the test demonstrate, what effect will it have on my learning, is it fair?  All in all a very productive week.

Brainstorming the Future of Corpus Tools

Since arriving at the Centre for Corpus Approaches to Social Science (CASS), I’ve been thinking a lot about corpus tools. As I wrote in my blog entry of June 3, I have been working on various software programs to help corpus linguists process and analyse texts, including VariAnt, SarAnt, TagAnt. Since then, I’ve also updated my mono-corpus analysis toolkit, AntConc, as well as updated my desktop and web-based parallel corpus tools, including AntPConc and the interfaces to the ENEJE and EXEMPRAES corpora. I’ve even started working with Paul Baker of Lancaster University on a completely new tool that provides detailed analyses of keywords.

In preparation for my plenary talk on corpus tools, given at the Teaching and Language Corpora (TaLC 11) conference held at Lancaster University, I interviewed many corpus linguists about their uses of corpus tools and their views on the future of corpus tools. I also interviewed people from other fields about their views on tools, including Jim Wild, the Vice President of the Royal Astronomical Society.

From my investigations, it was clear that corpus linguists rely on and very much appreciate the importance of tools in their work. But, it also became clear that corpus linguists can sometimes find it difficult to see beyond the features of their preferred concordancer or word frequency generator and attempt to look at language data in completely new and interesting ways. An analogy I often use (and one I detailed in my plenary talk at TaLC 11) is that of an astronomer. Corpus linguists can sometimes find that their telescopes are not powerful enough or sophisticated enough to delve into the depths of their research space. But, rather than attempting to build new telescopes that would reveal what they hope to see (an analogy to programming) or working with others to build such a telescope (an analogy to working with a software developer), corpus linguists simply turn their telescopes to other areas of the sky where their existing telescopes will continue to suffice.

To raise the awareness of corpus tools in the field and also generate new ideas for corpus tools that might be developed by individual programmers or within team projects, I proposed the first corpus tools brainstorming session at the 2014 American Association of Corpus Linguistics (AACL 2014) conference. Randi Reppen and the other organizers of the conference strongly supported the idea, and it finally became a reality on September 25, 2014, the first day of the conference.

At the session, over 30 people participated, filling the room. After I gave a brief overview of the history of corpus tools development, the participants thought about the ways in which they currently use corpora and the tools needed to do their work. The usual suspects—frequency lists (and frequency list comparisons), keyword-in-context concordances and plots, clusters and n-grams, collocates, and keywords—were all mentioned. In addition, the participants talked about how they are increasingly using statistics tools and also starting programming to find dispersion measures. A summary of the ways people use corpora is given below:

  • find word/phrase patterns (KWIC)
  • find word/phrase positions (plot)
  • find collocates
  • find n-grams/lexical bundles
  • find clusters
  • generate word lists
  • generate keyword lists
  • match patterns in text (via scripting)
  • generate statistics (e.g. using R)
  • measure dispersion of word/phrase patterns
  • compare words/synonyms
  • identify characteristics of texts

Next, the participants formed groups, and began brainstorming ideas for new tools that they would like to see developed. Each group came up with many ideas, and explained these to the session as a whole. The ideas are summarised below:

  • compute distances between subsequent occurrences of search patterns (e.g. words, lemmas, POS)
  • quantify the degree of variability around search patterns
  • generate counts per text (in addition to corpus)
  • extract definitions
  • find patterns of range and frequency
  • work with private data but allow  for powerful handling of annotation (e.g. comparing frequencies of sub-corpora)
  • carry out extensive move analysis over large texts
  • search corpora by semantic class
  • process audio data
  • carry out phonological analysis (e.g. neighbor density)
  • use tools to build a corpus (e.g. finding texts, annotating texts, converting non-ASCII characters to ASCII)
  • create new visualizations of data (e.g. a roman candle of words that ‘explode’ out of a text)
  • identify the encoding of corpus texts
  • compare two corpora along many dimensions
  • identify changes in language over time
  • disambiguate word senses

From the list, it is clear that the field is moving towards more sophisticated analyses of data. People are also thinking of new and interesting ways to analyse corpora. But, perhaps the list also reveals a tendency for corpus linguists to think more in terms of what they can do rather than what they should do, an observation made by Douglas Biber, who also attended the session. As Jim Wild said when I interviewed him in July, “Research should be led by the science not the tool.” In corpus linguistics, clearly we should not be trapped into a particular research topic because of the limitations of the tools available to us. We should always strive to answer the questions that need to be answered. If the current tools cannot help us answer those questions, we may need to work with a software developer or perhaps even start learning to program ourselves so that new tools will emerge to help us tackle these difficult questions.

I am very happy that I was able to organize the corpus tools brainstorming session at AACL 2014, and I would like to thank all the participants for coming and sharing their ideas. I will continue thinking about corpus tools and working to make some of the ideas suggested at the session become a reality.

The complete slides for the AACL 2014 corpus tools brainstorming session can be found here. My personal website is here.

Swimming in the deep end of the Spoken BNC2014 media frenzy

As someone who enjoys acting in his spare time, I’m rarely afraid of the chance spend some time in the spotlight. But as I sat one morning a few weeks ago in my bedroom, in nothing but a dressing gown, about to do a live interview on a national Irish radio station, with no kind of media training or experience under my belt, I really did get a case of the nerves. I would spend the entire day appearing on over a dozen radio and TV broadcasts (thankfully with time to get dressed after the first), promoting participation in the Spoken BNC2014 project, and finding out the true meaning of the phrase ‘learning on the job’. My experiences taught me a few things about the relationship between the broadcast media and academic research, which I’ve summarised at the end of this blog.

In late July, CASS and Cambridge University Press announced a new collaboration which aims to compile a new spoken British National Corpus, known as the Spoken BNC2014. This is an ambitious project that requires contributions of recordings from hundreds, if not thousands, of speakers from across the entire United Kingdom. As a research team (which includes Lancaster’s Professor Tony McEnery, Cambridge’s Dr Claire Dembry, as well as Dr Vaclav Brezina, Dr Andrew Hardie, and me), we knew that we had to spread the word far and wide in order to drum up the participation of speakers across the country.

So, at the end of August, we put out a press release which teased some preliminary observations, and invited people to get involved by emailing corpus(Replace this parenthesis with the @ sign) These findings were based on some basic comparisons between the relative frequencies of the words in the demographic section of the original spoken BNC, and those of the first two million words collected for the Spoken BNC2014 project. We put out lists of the top ten words which had fallen and risen in relative frequency the most drastically between the 1990s data and today’s data.

Words which had declined Words which had risen
fortnight facebook
marvellous internet
fetch website
walkman awesome
poll email
catalogue google
pussy cat smartphone
marmalade iphone
drawers essentially
cheerio treadmill

It seems that these words really captured the imagination of the media powers that be. On the week of the release at the end of August, I was told on the Monday afternoon that the release had been sent out. By late that night, the story had already been picked up by the Daily Mail. Such was my joy, and perhaps naivety, that I sent out a brief and fairly humble blog post celebrating the fact that one person from one newspaper had run an article on our story. What I didn’t realise at the time was that, had I put out a blog post every time we discovered a piece of coverage the next day, I would still be writing them now.

The next morning I was woken by a message from Lancaster Linguistics and English Language department’s resident media celebrity, Dr Claire Hardaker, asking urgently for some information about the Spoken BNC2014 project. She had been contacted by LBC Radio, who had caught wind of the story and assumed sort-of-understandably that, since it was a linguistics story that involved Lancaster University, Claire would be directly involved. She isn’t, sadly, but they had lined up a live interview with her in twenty minutes’ time regardless, and she had kindly agreed to do it anyway with what information I could get to her in time.

After that, I soon realised that perhaps this story would garner more interest than a few newspaper articles. My phone went into melt-down, bleeping with emails from the PR team at the university and phone calls from unknown numbers. There was a 90 minute period where I couldn’t leave my room to get a shower, get dressed, and get on to the campus, simply because I was being lined up for so many interviews throughout the day. As such, I had to do my first there and then, in my dressing gown, while Claire Hardaker kindly waited on stand-by in the university press office in case I couldn’t make it to campus on time for my next.

Once I got there, it was a busy day of interviews right through to 6pm that evening. Over the course of the day, I was interviewed by international radio stations BBC World Service and Talk Radio Europe, UK national stations BBC Radio 4, Sky Radio, and Classic FM, Irish national station Today FM, and Russian national station Voice of Russia UK. I was also interviewed by UK regional BBC news stations London, Merseyside, Coventry & Warwick, Lancashire, and Three Counties. The highlight for me though was the TV interview with the Sky News channel, which I recorded using the Skype app on my little Windows tablet. The interviewer could see me, but I couldn’t see her (or indeed hear her all that well), and I had no idea that she was set up in the studio and that the video would be edited together and released that day. Aside from being shown on the Sky News television channel itself, and their website, the interview appeared on upwards of 40 regional radio websites, including Rock FM, Magic FM, The Bee, North Sound, Yorkshire Coast Radio, Wave 965, and Juice Brighton, as well as other media sites. Claire Dembry also got involved from Cambridge, doing further TV interviews with Sky News and even joining me for a live double interview with BBC Radio London.

So, what did I ‘learn on the job’ through my baptism of fire in the media world? Three main points:

  • Some interviewers thought I was announcing the death of the English language

Though most of the interviews went about as smoothly as I could have expected, with me remembering to plug the email address corpus(Replace this parenthesis with the @ sign) at any given opportunity, some were much harder work. Some interviewers seemed horrified at the thought of ‘losing’ words such as marvellous and cheerio, and wanted me to tell them what they could do to help rescue them. Though it was tempting to say “well if you keep saying them they won’t disappear…”, I instead politely made the point that language, like everything else to do with being human, changes over time, and that this is perfectly okay. Just like fashion. This ‘endangered species’ discourse came about in a few interviews, and it seemed that the interviewers felt I was suggesting that the English language was somehow shrinking or degrading over time.

  • Some interviewers thought I was actively promoting the changes I was reporting

In other cases, the interviewers seemed to imply that I was making recommendations for the words that speakers should avoid or should start saying more, in order to ‘stay up to date’ and not come across ‘old fashioned’. In other words, I was mistaken for a prescriptivist rather than a descriptivist, who was trying to stop people from using the word catalogue, or encouraging everybody to say the word treadmill at least five times a day.

  • Some interviewers asked ‘nice’ questions, and some didn’t

This is a more general observation which I suspected to be the case before I started, and had it confirmed as the interviews went on. It is a simple truth that the interviewers who ‘got’ the project the most were the ones who, for me, asked the best questions. When being interviewed about the list of words which have decreased in frequency I was, in varying forms and among many others, asked the following two types of question:

A: The words which were more popular in the 1990s but not so much now – tell me about ‘pussy cat’ – what’s going on there?

B: The words which were as popular in the 1990s as Facebook is now – I guess words like ‘marvellous’ and ‘catalogue’ are harder to spell and we’re getting lazier these days so we’re just going to say shorter words aren’t we?

For me, and I imagine many others, question A is the ‘nice’ question of this pair. The interviewer draws me to one example which looks interesting – fair enough – but importantly they make no inference themselves about the possible explanation. They set up a blank canvas and allow me to paint it in the way which is most advantageous to my purpose.

Question B, however, is much more problematic for me as the interviewee and sadly occurred as much, if not more, than those like question A. Firstly the interviewer has re-conceptualised the findings and created equivalence between the frequency of the declining words and the words on the rise. Therefore the possibility for conclusions like “marmalade used to be as popular as Facebook” or, worse, “iPhones replace pussy cats in British society” are opened up and thrown into the ether.

Secondly, and much harder to deal with immediately, is the lumping of two completely unrelated words (marvellous and catalogue), the assumption of societal degradation (we’re getting lazier), the pseudo-logical causal relationship between written conventions and spoken interaction (harder to spell), which are based on such assumptions of societal degradation (so we’re just going to say shorter words), and, the icing on the cake, the tag question which invites me to agree that everything the interviewer has just said is perfectly correct (aren’t we?). Yes, this is indeed not a nice question. The strategy I developed is to say that yes, everything you have just said could be the case, and then to go about repackaging their question into something more reasonable for me to say anything about. This was not easy and in some cases I did this better than others!

The recurring theme of my experience was the extent to which the interviewers’ expectations of the Spoken BNC2014 research matched what we are actually trying to do. Most of the time, there was a close match and the questions fit my aims well. In the cases where this didn’t happen, and the questions made all sorts of false assumptions, life was more difficult. I don’t think, however, that anyone was deliberately misconstruing our humble aims, and really I’d rather have given those difficult interviews, where I felt like I was in a fight for mutual understanding, than not to have given them at all for fear of being misunderstood. It seems that this is an inevitable aspect of daring to throw your work out of the bubble of academia and into the public sphere, where it really matters. My goal for next time is to improve the way that the research is communicated in the first place, and to plug potential potholes of misunderstanding in a way that is as accurate as reasonable but still makes a good story.

Overall, I think I managed as well as I could have done, given the abrupt start to the day and my naïve expectation that the press wouldn’t be as interested in the story as it turns out they were. Hopefully we’ll have generated lots of interest in the project. I’d like to thank Claire Hardaker for helping me learn the ropes as I went along, the staff at Lancaster University’s press office for keeping me in the right place at the right time, and the ESRC, who have since offered me some media training, which I will very gladly accept. Awesome!