The Spoken BNC2014 project features in the Daily Mail

BNC2014 logoThe recently announced collaboration between Cambridge University Press and CASS, the Spoken BNC2014 project, has made headlines in the Daily Mail.

The article, entitled, “No longer marvellous – now we’re all awesome: Britons are using more American words because traditional English is in decline”, describes the preliminary findings of the project, which is in its early stages.

To participate in the project, native British English speakers from all over the UK can record their conversations and send them to us as MP3 files. For each hour of good quality recordings we receive, along with all associated consent forms and information sheets completed correctly, we will pay £18. Each recording does not have to be 1 hour in length; participants may submit two 30 minute recordings, or three 20 minute recordings, but for each hour in total, they will receive £18.

To register your interest in participating, please email corpus(Replace this parenthesis with the @ sign)cambridge.org

Gypsies, tramps and thieves? UK national newspaper depictions of Romanians and Bulgarians analysed

British tabloid newspapers repeatedly associated Romanians – but not Bulgarians – with criminality and anti-social behavior during 2012-2013, a comprehensive new “big data” report by Oxford University’s Migration Observatory shows.

The report Bulgarians and Romanians in the British national press was undertaken by CASS Challenge Panel Member William Allen and Dora-Olivia Vicol at the Migration Observatory at Oxford University. It provides a detailed analysis of the language used by 19 British national newspapers to discuss Romanians and Bulgarians between December 1st 2012 and December 1st 2013. The analysis encompasses 4,000 articles, letters and comment pieces mentioning Romanians and/or Bulgarians, a total of more than 2.8 million words.

Key findings include:

  • Language used by tabloid newspapers to describe and discuss Romanians as a single group was frequently focused on crime and anti-social behavior (gang, criminal, beggar, thief, squatter). This was less prevalent in broadsheet newspapers.
  • Where Romanians and Bulgarians were discussed together this was consistently in the context of immigration, across both tabloid and broadsheet newspapers.
  • Verbs used to describe or discuss Romanians and Bulgarians together, across both broadsheets and tabloids were frequently related to travel (come, arrive, move, travel, head). In tabloids these included metaphors related to scale (flood, flock).
  • Words appearing before “Romanians and Bulgarians” in both tabloid and broadsheet newspapers were frequently related to prevention of movement (stop, control, block– tabloids) (deter, restrict, dissuade – broadsheets).
  • References to Romanians and Bulgarians together were frequently associated with specific numbers, across both tabloid and broadsheet newspapers. The most common specific numbers were 29 million – the approximate combined populations of Romania and Bulgaria – and 50,000 – a prediction from MigrationWatch, a pressure group which campaigns for reduced immigration, of how many A2 migrants would be added to the UK population each year for five years following the end of transitional controls.

Some language associated with stories unrelated to UK migration was also evident – particularly Romanian abattoirs implicated in the horsemeat scandal and the blonde Bulgarian Roma child who sparked an ‘abduction’ investigation in Greece.

William Allen, co-author of the report said: “The report is valuable because it provides a comprehensive account of how British national newspapers discussed Romanians and Bulgarians during a key period. The language used to describe Romanians – particularly in tabloid newspapers – often mention them alongside criminality and anti-social behaviour, while this was not the case with Bulgarians.” Read the full report here.

New working paper on “Changing Climate and Society: The Surprising Case of Brazil” now available

Why is Brazil unique when it comes to climate change? Brazil is a major emerging economy and it is the sixth-largest emitter of greenhouse gases. However, its fossil fuel-based emissions are low by global standards. Brazil has been innovative in developing some relevant low carbon ways of generating energy and pioneered significant transport innovations. It has also played a major role in international debates on global warming and Brazilians’ degree of concern about global warming is higher than almost anywhere else. Brazil has the largest reserve of agricultural land in the world and it houses most of the Amazon forest and river basin.

climatechangeworkingThis working paper examines the interesting case of Brazil, offering a general overview of the centrality of Brazil within climate policy and politics.

Download and read the complimentary working paper now.

Log Ratio – an informal introduction

In the latest version of CQPweb (v 3.1.7) a new statistic for keywords, collocations and lockwords is introduced, called Log Ratio.

“Log Ratio” is actually my own made-up abbreviated title for something which is more precisely defined as either the binary log of the ratio of relative frequencies or the binary log of the relative risk. Over the months I’ve been building up to this addition, people have kept telling me that I need a nice, easy to understand label for this measurement, and they are quite right. Thus Log Ratio. But what is Log Ratio?

Log Ratio is my attempt to suggest a better statistic for keywords/key tags than log-likelihood, which is the statistic normally used. The problem with this accepted procedure is that log-likelihood is a statistical significance measure – it tells us how much evidence we have for a difference between two corpora. However, it doesn’t tell us how big / how important a given difference is. But we very often want to know how big a difference is!

For instance, if we look at the top 200 keywords in a list, we want to look at the “most key” words, i.e. the words where the difference in frequency is greatest. But sorting the list by log-likelihood doesn’t give us this – it gives us the words we have most evidence for, even if the actual difference is quite small.

The Log Ratio statistic is an “effect-size” statistic, not a significance statistic: it does represent how big the difference between two corpora are for a particular keyword. It’s also a very transparent statistic in that it is easy to understand how it is calculated and why it represents the size of the difference.

When we present corpus frequencies, we usually give a relative frequency (or a normalised frequency as it is sometimes called): this is equal to the absolute frequency, divided by the size of the corpus or subcorpus. We often then multiply by a normalisation factor – 1,000 or 1,000,000 being the most usual factors – but this is, strictly speaking, optional and merely for presentation purposes.

Once we have made a frequency into a relative frequency by dividing it by the corpus size, we can compare it to the relative frequency of the same item in a different corpus. The easiest way to do this is to say how many times bigger the relative frequency is in one corpus as opposed to the other, which we work out by dividing one relative frequency by another. For instance, if the relative frequency of a word is 0.0006 in Corpus A and 0.0002 in Corpus B, then we can say that the relative frequency in Corpus A is three times bigger than in Corpus B (0.0006 ÷ 0.0002 = 3).

Dividing one number by another gives us the ratio of two numbers, so we can call this measure of the difference between the two corpora the ratio of relative frequencies (statisticians often call it the relative risk, for reasons I won’t go into here), and, as I’ve explained, it simply tells us how many times more frequent the word is in Corpus A than in Corpus B – so it’s a very transparent and understandable statistic.

We could use the ratio of relative frequencies as a keyness statistic but, in my view, it is useful to convert it into a logarithm (“log” for short) first – specifically, the logarithm to base 2 or binary logarithm. Why do this? Well, here’s how taking the log of the ratio works:

  • A word has the same relative frequency in A and B – the binary log of the ratio is 0
  • A word is 2 times more common in A than in B – the binary log of the ratio is 1
  • A word is 4 times more common in A than in B – the binary log of the ratio is 2
  • A word is 8 times more common in A than in B – the binary log of the ratio is 3
  • A word is 16 times more common in A than in B – the binary log of the ratio is 4
  • A word is 32 times more common in A than in B – the binary log of the ratio is 5

That is, once we take a binary log, every point represents a doubling of the ratio. This is very useful to help us focus on the overall magnitude of the difference (4 vs. 8 vs. 16) rather than differences that are pretty close together (e.g. 4 vs. 5 vs. 6).  This use of the binary log is very familiar in corpus linguistics – the commonly-used Mutual Information measure, which is closely related to the ratio of relative frequencies, is also calculated using a binary log.

So now we’ve arrived at our measure – the binary log of the ratio of relative frequencies, or Log Ratio for short.

If you followed the explanation above, then you know everything you need to know in order to interpret Log Ratio scores. If you didn’t follow it, then here’s the crucial takeaway: every extra point of Log Ratio score represents a doubling in size of the difference between the two corpora, for the keyword under consideration.

When we use Log Ratio for collocation, it has exactly the same interpretation, but applied to the zone around the node: every extra point of Log Ratio Score represents a doubling in size of the difference between the collocate’s frequency near the node and its frequency elsewhere. The outcome is a collocation measure very similar to Mutual Information.

Another advantage of Log Ratio is that it can be used for lockwords as well as keywords, which log-likelihood can’t. A Log Ratio of zero or nearly zero indicates a word that is “locked” between Corpus A and Corpus B. In consequence the new version of CQPweb allows you to look at lockwords – to my knowledge, the first general corpus tool that makes this possible.

A more formal discussion of Log Ratio will be at the core of my presentation at the ICAME conference later this week. A journal article will follow in due course.

Twitter’s reaction to the Benefits Britain live debate

Benefits Street was a series of television programmes broadcast by the Channel 4 outlet between 6th January and 10th February 2014 which, as Channel 4 have claimed, “sparked a national conversation about Britain’s welfare system”. The programme focussed on a community of people living in the economically deprived area of Winson Green, Birmingham and specifically documented the families and individuals that inhabit James Turner Street.

Following the series of pre-recorded, documentary-style programmes (the last episode of which was aired on 16th February 2014), Channel 4 hosted a live debate entitled Benefits Britain which featured a range of public figures and those who were documented in Benefits Street. This report looks at a set of data collected on the date on which the Benefits Britain debate aired (17th February 2014).

Data

The data selected to analyse reaction to this series were Tweets, or short ‘micro-blogs’ that offer users the opportunity to voice their opinions and network with other viewers (e.g. using @ replies or # topics) in real-time. Tweets were collected from 00:00am on Sunday 16th February 2014 (the date of the final airing of Benefits Street) until 23:59pm on Saturday 22nd February 2014 (totalling one calendar week worth of Twitter data).

To do this, we used the Twitter API to collect any tweets which contained in their content any of the following terms (note: the terms are not case sensitive, so terms can contain upper or lower case words without affecting data collection):

  • Benefits Britain
  • #BenefitsBritain
  • James Turner
  • Benefits Street
  • #BenefitsStreet

This query returned 81,100 tweets which came in at a total of 1,501,938 words (tokens).

Results

#BenefitsBritain

The #benefitsbritain hashtag was the most frequent token in the corpus featuring in 45,400 (3.02%) of all tweets. Channel 4 adopted the #BenefitsBritain hashtag immediately following the end of the Benefits Street programme which used the #BenefitsStreet hashtag, although this hashtag was used less (0.86%) of the time during the time in which the corpus was collected.

Several concerns are frequently expressed by users of the #BenefitsStreet hashtag. It was found that the word people is the most frequent ‘content word’ in tweets containing the #BenefitsBritain hashtag occurring in 15.2% of those tweets and occurs most frequently in the word cluster people on benefits. This cluster is associated with a number verbs including are, should, and have, which appears to be involved in ways of evaluating who people on benefits are as well as their (perceived) behaviours.

Who people on benefits are

Some appear to be challenging the stereotype that benefits claimants are workshy or lazy:

  • #benefitsbritain Some people on benefits are good people who’ve gone through a bad time not everyone on benefits are scumbags.
  • don’t think people should comment on things until they have been in that situation. Not all people on benefits are lazy etc!#BenefitsBritain
  • #BenefitsBritain am so annoyed that that show has stigmatised all people on benefits are scum when we all aren’t IT’S SO ANNOYING!!!!!

Some argue the absolute opposite:

  • #BenefitsBritain kiss my ass i think most people on benefits are lazy and need to get a damn job!!!! Cut all benefits for able bodies people

Or assume that claiming benefits is a result of a lack of skills or underlying criminality:

  • Half the people on benefits are unemployable stop there benefit and they commit crime and it costs more to imprison them #BenefitsBritain

And some are somewhat more ambivalent:

  • #BenefitsBritain Not all people on benefits are lazy, but if it becomes a lifestyle its dangerous territory, idle minds are the devils work.

What people on benefits do

In terms of evaluating what people on benefits do, a number users question the (perceived) behaviours of those claiming benefits:

  • Fail to see why some people on benefits are allowed to spend their money on drink, cigarettes and drugs #BenefitsBritain
  • watching the debate #BenefitsBritain most people on benefits have a criminal record now who wants to give them people a chance no one

Others propose possible restrictions on (perceived) social and spending behaviours:

  • Why don’t people on benefits have vouchers instead of money? Then they wouldn’t spend it on drink and drugs #BenefitsBritain
  • I stand by the fact that people on benefits should not have children when they cant afford to feed themselves. #BenefitsBritain
  • Agree with the guy who said people on benefits should be given food stamps #BenefitsBritain

Or suggest certain behavioural conditions be fulfilled in order to claim benefits:

  • People on benefits should be made to go out&do something before they get money volunteering or something!! #BenefitsBritain #BenefitsStreet
  • Active people on benefits should earn their benefits through voluntary work to assist the community #BenefitsBritain
  • People on benefits should only get paid if they do voluntary/training work. Then there is some progress in their lives. #BenefitsBritain

Some argue that people are workshy:

  • People on benefits have lacked the ability to work hard in education there for getting a low paid job or none at all #BenefitsBritain
  • #BenefitsBritain all people on benefits should get of their arse and work like the rest of us do everyday

Or have a grudge against those who work:

  • What is it that some people on benefits have against working class people who’ve been successful? #benefitsdebate #BenefitsBritain

Personalities

Two specific names were also frequent in tweets using the #BenefitsBritain hashtag.

The first is the host of the Benefits Britain debate, Richard Bacon. Mainly, those who spoke about Bacon brought his abilities as a host into question. One of the more creative and less direct insults being:

  • Richard Bacon is a cross between Jeremy Kyle & Kilroy! @Channel4 would have been better off getting @rickedwards1 hosting #BenefitsBritain

The second person featuring frequent was (White) Dee, a prominent personality in the Benefits Street programme. Mainly, the response to her was positive. Although, there were some negative reactions:

  • #BenefitsBritain always the governments fault -what nonsense Dee will never look at herself and see what a lazy scroungers she is
  • My view on #BenefitsBritain Richard Bacon is a cock oh and White Dee is a sweaty lazy cow

To-infinitives

Aside from the #BenefitsBritain hashtag, the next most frequent token in the corpus was the determiner ‘the’. The fourth most frequent token was the word ‘to’, which can be interpreted either as a preposition or as part of infinitive verbs. Looking at clusters in which to occurs revealed that in fact to occurred within a number of infinitive verb forms. I look here at the 3 most frequent: to be, to work, and to get, to see how infinitives work within the #BenefitsBritain tweet corpus and what ideas they are used to express.

To be

The infinitive verb to be was frequently found being used in a number of interesting ways.

Users were excited that the Benefits Debate was going to be interesting:

  • #BenefitsBritain this is going to be interesting!

And frequently challenged the stereotype that only poor people are drug addicts, as with this retweet:

  • ‘Billionaire’s Row residents are as likely to be drug addicts as people on Benefits Street’ says MP Chris Bryant http://t.co/JG750GrJE5

When found in the cluster need to be people and work again became central to debate:

  • Finally people talking about politics. Reality is we need to be paying people a living wage vote labour #benefitsstreet
  • Benefits is like a Government Drug. These people need to be weaned off the drug and get a job! #BenefitsBritain #BenefitStreet

To work

The infinitive to work not only most frequently occurs in the word cluster want to work, but is also closely associated with different ways of referring to people, either through pronouns (they, everyone, I), or the most frequent ‘content word’ in the corpus, people. As such, the formation want to work is found in tweets expressing general opinions about the desirability of work:

  • Some people do want to work but it’s not as simple sick people are getting harassed to work when they are not fit #BenefitsBritain
  • Majority of disabled and unemployed people want to work #BenefitsBritain #BenefitStreet
  • I am so sick of hearing, make work pay, incentivize people to work. People want to work. The jobs don’t pay a living wage #benefitsbritain

Moreover, want to work is strategically used in straw man arguments against the idea that people want to work:

  • These people clearly want to work? Really??? has he watched the same programme? #BenefitStreet #BenefitsBritain

And frequently collocates with the negative forms such as don’t, doesn’t in examples such as the following which express the idea that those people claiming benefits see work as undesirable:

  • Let’s be real most of the people on the programme don’t really want to work anyway #BenefitsBritainIf we’re being fair…there are also A LOT of people on benefits who definitely DON’T want to work… #BenefitsDebate #BenefitsBritain
  • #BenefitsStreet there is an inherent problem with some ppl in this country; they don’t want to work! Send them overseas; no benefits
  • #BenefitsBritain Not all people on benefits want to work just come #skelmersdale for the next series. Wont need no editing or bribes!!

To get

To get is the third most frequent infinitive verb formation and occurs most frequently in the phrase to get a job. Underpinning how this phrase is used is a moralised debate surrounding (un)employment which naturalises and elevates the status of employment and the employed and alienates and derides unemployment and the unemployed; having a job makes you good, having no job makes you bad. This is borne out by the data.

This includes talking about the difficulty of getting a job:

  • “#BenefitsBritain makes a lot of valid points, you need experience to get a job, you need experience to get experience! Can never win!”

Structural/political issues:

  • #benefitsstreet #BenefitsBritain is all the fault of #thatcher who closed everything down then #cameron who makes it difficult to get a job

And corruption:

  • #BenefitsBritain to get a job it’s not all what you know its who you know #thesystemsfucked

As well as reactions against pressure to work within a climate where work is hard to find:

  • These guys on Benefits Britain thinking it’s so easy to get a job. Get back to reality you stuck up twats! #BenefitsBritain #BenefitsStreet

However, most of the uses of the to get a job phrase target jobseekers and construct them in relation to prejudices and assumptions about the (un)employed:

  • Why is everyone too scared to stand up and say ‘work harder to get a job/off drugs/off drink’? #BenefitsBritain
  • #benefitsstreet this show makes me so angry.. Get off your fat ass and try to get a job instead of sponging off the country
  • Fuck this Benefits Street debate is making me angry. Lazy twats need to get a fucking job.
  • #BenefitsBritain kiss my ass i think most people on benefits are lazy and need to get a damn job!!!! Cut all benefits for able bodies people

Summary

This data highlights a kind of moralisation of (un)employment, where ideologies underpinning this moralisation are both reinforced and challenged. The data reveals a number of apparently stable linguistic formations used to talk about unemployed benefits claimants, which appear to have revealed aspects of the ideological underpinnings of the debate.

‘Fight’ metaphors for cancer revisited: Are they always bad?

By the ‘Metaphor in End-of-Life Care’ project team, funded by the UK’s Economic and Social Research Funding Council (ESRC):

Elena Semino, Veronika Koller, Jane Demmen, Andrew Hardie, Paul Rayson, Sheila Payne (Lancaster University) and Zsófia Demjén (Open University)

Recent media controversy over the use of social media by people with terminal illness has sparked a new debate on ‘fight’ metaphors for cancer. Writing in the New York Times on 12th January 2014 about Lisa Bonchek Adams’s blogging and tweeting, Bill Keller describes her as having spent the last seven years in a fierce and very public cage fight with death’. On the one hand, Keller acknowledges that Bonchek Adams’s ‘decision to treat her terminal disease as a military campaign has worked for her’. On the other hand, he favourably compares his own father-in-law’s ‘calm death’ with what he describes as Bonchek-Adams’s choice to be ‘constantly engaged in battlefield strategy with her medical team’.

As part of the ESRC-funded project ‘Metaphor in End-of-Life Care’ at Lancaster University, we are studying the use of ‘fight’ metaphors by cancer patients in a large collection of interviews and online fora. We have found plenty of evidence of the negative sides of these metaphors, which have been criticised by many patients and commentators before Keller, and most famously by Susan Sontag in Illness as Metaphor (1979). Seeing illness as a fight can make people feel inadequate and responsible if they do not get better, as when a patient in our data writes: I feel such a failure that I am not winning this battle’. Military metaphors can also express distressing ways of perceiving oneself, such as when some patients describe themselves as ‘time bombs’ during periods of remission.

On the other hand, we are finding that, for some patients, ‘fight’ metaphors do seem to provide meaning, purpose and a positive sense of self. For example, writing in an online forum, a cancer sufferer proudly comments: ‘my consultants recognised that I was a born fighter’. Another patient says in an interview: ‘I don’t intend to give up; I don’t intend to give in. No I want to fight it. I don’t want it to beat me, I want to beat it. Because I don’t think we should give up trying.’ ‘Fight’ metaphors are also used to give and receive encouragement and solidarity. For example, a patient writes ‘let me hear you scream the battle cry to spur us on to win this war’, while another ends an online forum post with the words ‘Soldier on everybody’.

We would not go as far as to argue that ‘fight’ metaphors should be rehabilitated: they can do real harm, and nobody should ever feel under pressure to see themselves as fighters. However, as with most metaphors, the implications of ‘fight’ metaphors change depending on who uses them, why, where and how. Our data suggest that they can be helpful enough to be recognised and accepted as one of many possible ways of approaching illness, including its terminal phase.


The Economic and Social Research Council (ESRC) is the UK’s largest organisation for funding research on economic and social issues. It supports independent, high quality research which has an impact on business, the public sector and the third sector. The ESRC’s total budget for 2013/14 is £212 million. At any one time the ESRC supports over 4,000 researchers and postgraduate students in academic institutions and independent research institutes.

Originally posted on eHospice. Visit their page for more palliative care news from around the world.

The Twitter reaction to the sentencing of the Lee Rigby murderers – 26th February 2014

by Love, R., McEnery, T. & Wattam, S.

Introduction

The ESRC-funded Centre for Corpus Approaches to Social Science (CASS) at Lancaster University has undertaken some preliminary research into the immediate reaction on Twitter to the sentencing of the Lee Rigby murderers on Wednesday 26th February 2014. This document summarises our findings.

Background

On the afternoon of Wednesday 22nd May 2013, British soldier Lee Rigby was murdered by two men, Michael Adebolajo and Michael Adebowale, near the Royal Artillery Barracks in Woolwich, London. The attack, which was carried out in broad daylight, quickly became a major national news story. In December 2013 the perpetrators were found guilty of murder and were sentenced on Wednesday 26th February 2014. Adebolajo received a whole-life sentence (meaning he will never be released) and Adebowale received a life sentence with a minimum term of 45 years imprisonment.

How the research was carried out

We carried out our research by using the Twitter API to collect a large amount of tweets[1] that referred to the Rigby case, in some way, between 00.00 and 23.59 on Wednesday 26th February 2014. All tweets containing one or more of the following terms were included in our search:

rigby, adebolajo, adebowale, woolwich trial, woolwich sentence, woolwich      sentencing, justice Sweeney, #leerigby, #rigbytrial, #rigbysentence, #woolwich, #woolwichmurder, #woolwichattack, #woolwichtrial

Using these search terms we collected a total of 57,097 tweets over the 24 hour period, which included retweets (RTs), quotes etc. This amounted to a total of 1,109,136 words of Twitter discussion about the case. We then used a set of tools and methods developed in corpus linguistics to find out the ways in which Twitter users discussed the sentencing on the day of the decision.

Findings

The following is a selection of preliminary findings based on the analysis of the tweets.

  • Nearly two thirds of the tweets were retweets[2]

Nearly 35,000 tweets (60.1% of tweets) included the retweet abbreviation RT. This confirms that Twitter discussion of the Lee Rigby case was highly retweeted and shared by Twitter users. The top ten most frequently retweeted Twitter handles appear to have been:

Rank Handle Description
1 @bbcbreaking Breaking news account for BBC News
2 @skymarkwhite Home Affairs Correspondent for Sky News
3 @skynewsbreak Breaking news account for Sky News
4 @poppypride1 An “independent account supporting all troop charities”
5 @jakeleonardx Young footballer at Crewe Alexandra Academy
6 @itvnews Main account for ITV News
7 @courtnewsuk News reports account for the Old Bailey
8 @thesunnewspaper Main account for The Sun newspaper
9 @bbcnews Main account for BBC News
10 @unnamedinsider Satirical commentator

Based on these it seems that the most popular form of Twitter interaction relating to the Rigby sentencing was to retweet news updates from well-known news providers including the BBC News, Sky News, ITV News and The Sun. @jakeleonardx is not a celebrity (he has less than 1,000 followers), but when he tweeted a photo of Lee Rigby’s son with the caption “Poor little lad, RIP Lee Rigby”, it was retweeted nearly 1,000 times. @unnamedinsider appears to be better known (with over 34,000 followers), and posted two tweets ridiculing the BNP and EDL protesters who had gathered outside of the Old Bailey for the sentencing.

  • The most salient word (apart from names and Twitter terms) was life

Twitter users were very concerned with the nature of the sentence being delivered in the sentencing, using the word ‘life’ 19,498 times (34.1% of tweets). The most common three-word phrase this was used in was life in prison (4,369 times, 7.7% of tweets), confirming that Twitter users were not concerned about the loss of life but rather the restriction of those of the perpetrators.

  • Some Twitter users wanted more than whole-life terms for the perpetrators

As well as whole-life terms, Twitter users strongly expressed their opinion about other punishments they deemed suitable for the perpetrators. In particular, highly salient words like rot, deserve, should and hang indicate this. The most popular three-word expression relating to such desired punishments is rot in hell. Furthermore the word deserve occurred 1,295 times (2.3% of tweets), an indication of a clear evaluation of the sanction proposed: popular four-word phrases containing deserve included deserve a life sentence, deserve to be hung, and deserve the death penalty. Likewise the word should is almost exclusively used to wish death upon the perpetrators of the murder, while hang relates to the most popular way in which Twitter users wanted capital punishment to be undertaken upon the killers.

  • Michael Adebolajo was discussed more than Michael Adebowale

The surname ‘Adebolajo’ was tweeted 15,092 times (26.4% of tweets) compared to ‘Adebowale’ being tweeted only 11,729 times (20.5% of tweets). This indicates that the perpetrator, who received the whole-life sentence was of more concern for tweeters than the perpetrator who received the less severe punishment.

  • The most salient word used to describe Adebolajo and Adebowale was scum, and the most salient swear word was cunts

Twitter’s word of choice for the perpetrators was scum, which occurred 1,466 times (2.6% of tweets). Popular phrases included ‘the scum’, ‘this scum’, ‘two scum’, ‘them scum’ and ‘those scum’, and popular words that combined with scum include absolute, fucking, murdering and jihadi. Furthermore, the swear word cunts was used 800 times in tweets about the Rigby sentencing (1.4% of tweets). This further indicates that, as expected, there was considerable disapproval and anger expressed towards the perpetrators. Words that combined with cunts to describe the perpetrators included dirty, sick, horrible, fucking, evil, scummy, vile, muslim, murdering and filthy.

  • In terms of religion, Twitter users were most concerned about Islam

The three most salient religious words were islamistas, Islam and Muslim. Islamistas (Spanish for Islamists) occurred in Spanish language tweets reporting the result of the sentencing (though most tweets were produced in English, and by users from the UK, there appears to have been activity from all over the world).  The other terms mostly occur in retweets and discussions about the judge’s statement that the perpetrators had betrayed Islam by murdering Rigby. The general opinion appears to be that the murder was nothing to do with the religion of Islam.

Conclusion

This preliminary analysis, using tools and methods from corpus linguistics, has captured a general impression of the Twitter reaction to the sentencing of the Lee Rigby murderers. It seems that the main reaction centred around the nature of the sentencing and the Twitter users’ wishes for both Michael Adebolajo and Michael Adebowale to receive at least a whole-life sentence but preferably death. Furthermore some Twitter users appeared unrestrained in their willingness to use offensive language to describe the killers.


[1] As many as possible were collected, but given the immediacy of the event and the nature of the search method, we acknowledge that Twitter users may have tweeted about the Rigby trial without using any of these terms.

[2] This may have been even higher than this if we take into account retweets that do not contain the letters ‘RT’.

Update on Changing Climates

The Changing Climates project is a corpus-based investigation of discourses around climate change. It aims to examine how climate change has been framed in the media coverage across Britain and Brazil in the past decade. Here, we look at two different scenarios. Recent surveys have shown that climate change is currently considered a high priority concern within Brazil, with the country showing higher degree of concern than almost anywhere else. By contrast, climate change scepticism is increasingly prominent in the British public sphere.

We are pleased to announce that we have just finished collecting the data. The Brazilian corpus contains about 8 million words, comprising texts from 12 newspapers. The British corpus is much larger. It has nearly 80 million words and includes texts published by all major British broadsheet and tabloid papers.

More about the Metaphor in End of Life Care project at Lancaster University

MELCcoverThe CASS-affiliated Metaphor in End of Life Care project has just released a free resource containing information of interest to many of our readers. Download the document now to learn more about the project, from basic concepts (what is metaphor, and how are they used in everyday life?) to more specific details (why study metaphor in end-of-life care?). Some interesting initial findings are also included. For instance, “Family carers often say that their emotions can only be safely ‘released’ when talking to people who are ‘in the same boat’.” Read on to learn more about the project.

Using Corpora to Analyze Gender

ucagI wrote UCAG during a sabbatical as a semi-sequel to a book I published in 2006 called Using Corpora for Discourse Analysis. Part of the reason for the second book was to update and expand some of my thinking around discourse- or social-related corpus linguistics. As time has passed, I haven’t become disenamoured of corpus methods, but I have become more reflective and critical of them and I wanted to use the book to highlight what they can and can’t do, and how researchers need to be guarded against using tools which might send them down a particular analytical path with a set of pre-ordained answers. Part of this has involved reflecting on how interpretations and explanations of corpus findings often need to come from outside the texts themselves (one of the tenets of critical discourse analysis), and subsequently whether a corpus approach requires analysts to go further and critically evaluate their findings in terms of “who benefits”.

Another way in which my thinking around corpus linguistics has developed since 2006 is in considering the advantages of methodological triangulation (or approaching a research project in multiple ways). In one analysis chapter I take three small corpora of adverts from Craigslist and try out three methods of attempting to uncover something interesting about gender from them – one very broad involving an automated tagging of every word, one semi-automatic relying on a focus on a smaller set of words, and another much more qualitative, relying on looking at concordance lines only. In another chapter I look at “difficult” search terms – comparing two methods of finding all the cases where a lecturer indicates that a student has given an incorrect answer in a corpus of academic-related speech. Would it be better to just read the whole corpus from start to finish, or is it possible to devise search terms so concordancing would elicit pretty much the same set?

The book also gave me a chance to revisit older data, particularly a set of newspaper articles about gay people from the Daily Mail which I had first looked at in Public Discourses of Gay Men (2005). As a replication experiment I revisited that data and redid an analysis I had first carried out about 10 years ago. While the idea of an objective researcher is fictional, corpus methods have aimed to redress the issue of researcher bias to an extent – although in retreading my steps, I did not obtain exactly the same results. Fortunately, the overall outcome was the same, but there were a few important points that the 10 years younger version of me missed. Does that matter? I suspect it doesn’t invalidate the analysis although it is a useful reminder about how our own analytical abilities alter over time.

Part of the reason for writing the book was to address other researchers who are either from corpus linguistics and want to look at gender, or who do research in gender and want to use corpus methods. I sometimes feel that these two groups of people do not talk to each other very much and as a result the corpus research in this area is often based around the “gender differences” paradigm where the focus is on how men and women apparently differ from each other in language use (with attendant metaphors about Mars and Venus). Chapters 2 and to an extent 3, address this by trying a number of experiments to see just how much lexical variation there is in sets of spoken corpora of male and female language – and when difference is found, how can it be explained? I also warn against lumping all men together into a box to compare them with all women who are put in a second box. The variation within the boxes can actually be the more interesting story to tell and this is where corpus tools around dispersion can really come into their own. So even if, for example, men do swear more than women, it’s not all men and not all the time. On the other hand, some differences which are more consistent and widespread can be incredibly revealing, although not in ways you might think – chapter 2 took me down an analytical path that ended up at the word Christmas – not perhaps an especially interesting word relating to gender, but it produced a lovely punchline to the chapter.

It was also good to introduce different corpora, tools and techniques that weren’t available in 2006. Mark Davies has an amazing set of online corpora, mostly based around American English, and I took the opportunity to use the COHA (Corpus of Historical American English) to track changes in language which reflects male bias over time, from the start of the 19th century to the present day. Another chapter utilises Adam Kilgariff’s online tool Sketch Engine which allows collocates to be calculated in terms of their grammatical relationships to one another. This allowed for a comparison of the terms boy and girl which allowed me to consider verbs that positioned either as subject or object. So girls are more likely to be impressed while boys are more likely to be outperformed. On the other hand boys cry whereas girls scream.

It would be great if the book inspired other researchers to consider the potential of using corpora in discourse/social related subjects as well as showing how this potential has expanded in recent years. It’s been fun to explore a relatively unexplored field (or rather travel a route between two connecting fields) but it occasionally gets lonely. I hope to encounter a few more people heading in the same direction as me in the coming years.