Why is Brazil unique when it comes to climate change? Brazil is a major emerging economy and it is the sixth-largest emitter of greenhouse gases. However, its fossil fuel-based emissions are low by global standards. Brazil has been innovative in developing some relevant low carbon ways of generating energy and pioneered significant transport innovations. It has also played a major role in international debates on global warming and Brazilians’ degree of concern about global warming is higher than almost anywhere else. Brazil has the largest reserve of agricultural land in the world and it houses most of the Amazon forest and river basin.
In the latest version of CQPweb (v 3.1.7) a new statistic for keywords, collocations and lockwords is introduced, called Log Ratio.
“Log Ratio” is actually my own made-up abbreviated title for something which is more precisely defined as either the binary log of the ratio of relative frequencies or the binary log of the relative risk. Over the months I’ve been building up to this addition, people have kept telling me that I need a nice, easy to understand label for this measurement, and they are quite right. Thus Log Ratio. But what is Log Ratio?
Log Ratio is my attempt to suggest a better statistic for keywords/key tags than log-likelihood, which is the statistic normally used. The problem with this accepted procedure is that log-likelihood is a statistical significance measure – it tells us how much evidence we have for a difference between two corpora. However, it doesn’t tell us how big / how important a given difference is. But we very often want to know how big a difference is!
For instance, if we look at the top 200 keywords in a list, we want to look at the “most key” words, i.e. the words where the difference in frequency is greatest. But sorting the list by log-likelihood doesn’t give us this – it gives us the words we have most evidence for, even if the actual difference is quite small.
The Log Ratio statistic is an “effect-size” statistic, not a significance statistic: it does represent how big the difference between two corpora are for a particular keyword. It’s also a very transparent statistic in that it is easy to understand how it is calculated and why it represents the size of the difference.
When we present corpus frequencies, we usually give a relative frequency (or a normalised frequency as it is sometimes called): this is equal to the absolute frequency, divided by the size of the corpus or subcorpus. We often then multiply by a normalisation factor – 1,000 or 1,000,000 being the most usual factors – but this is, strictly speaking, optional and merely for presentation purposes.
Once we have made a frequency into a relative frequency by dividing it by the corpus size, we can compare it to the relative frequency of the same item in a different corpus. The easiest way to do this is to say how many times bigger the relative frequency is in one corpus as opposed to the other, which we work out by dividing one relative frequency by another. For instance, if the relative frequency of a word is 0.0006 in Corpus A and 0.0002 in Corpus B, then we can say that the relative frequency in Corpus A is three times bigger than in Corpus B (0.0006 ÷ 0.0002 = 3).
Dividing one number by another gives us the ratio of two numbers, so we can call this measure of the difference between the two corpora the ratio of relative frequencies (statisticians often call it the relative risk, for reasons I won’t go into here), and, as I’ve explained, it simply tells us how many times more frequent the word is in Corpus A than in Corpus B – so it’s a very transparent and understandable statistic.
We could use the ratio of relative frequencies as a keyness statistic but, in my view, it is useful to convert it into a logarithm (“log” for short) first – specifically, the logarithm to base 2 or binary logarithm. Why do this? Well, here’s how taking the log of the ratio works:
- A word has the same relative frequency in A and B – the binary log of the ratio is 0
- A word is 2 times more common in A than in B – the binary log of the ratio is 1
- A word is 4 times more common in A than in B – the binary log of the ratio is 2
- A word is 8 times more common in A than in B – the binary log of the ratio is 3
- A word is 16 times more common in A than in B – the binary log of the ratio is 4
- A word is 32 times more common in A than in B – the binary log of the ratio is 5
That is, once we take a binary log, every point represents a doubling of the ratio. This is very useful to help us focus on the overall magnitude of the difference (4 vs. 8 vs. 16) rather than differences that are pretty close together (e.g. 4 vs. 5 vs. 6). This use of the binary log is very familiar in corpus linguistics – the commonly-used Mutual Information measure, which is closely related to the ratio of relative frequencies, is also calculated using a binary log.
So now we’ve arrived at our measure – the binary log of the ratio of relative frequencies, or Log Ratio for short.
If you followed the explanation above, then you know everything you need to know in order to interpret Log Ratio scores. If you didn’t follow it, then here’s the crucial takeaway: every extra point of Log Ratio score represents a doubling in size of the difference between the two corpora, for the keyword under consideration.
When we use Log Ratio for collocation, it has exactly the same interpretation, but applied to the zone around the node: every extra point of Log Ratio Score represents a doubling in size of the difference between the collocate’s frequency near the node and its frequency elsewhere. The outcome is a collocation measure very similar to Mutual Information.
Another advantage of Log Ratio is that it can be used for lockwords as well as keywords, which log-likelihood can’t. A Log Ratio of zero or nearly zero indicates a word that is “locked” between Corpus A and Corpus B. In consequence the new version of CQPweb allows you to look at lockwords – to my knowledge, the first general corpus tool that makes this possible.
A more formal discussion of Log Ratio will be at the core of my presentation at the ICAME conference later this week. A journal article will follow in due course.
Benefits Street was a series of television programmes broadcast by the Channel 4 outlet between 6th January and 10th February 2014 which, as Channel 4 have claimed, “sparked a national conversation about Britain’s welfare system”. The programme focussed on a community of people living in the economically deprived area of Winson Green, Birmingham and specifically documented the families and individuals that inhabit James Turner Street.
Following the series of pre-recorded, documentary-style programmes (the last episode of which was aired on 16th February 2014), Channel 4 hosted a live debate entitled Benefits Britain which featured a range of public figures and those who were documented in Benefits Street. This report looks at a set of data collected on the date on which the Benefits Britain debate aired (17th February 2014).
The data selected to analyse reaction to this series were Tweets, or short ‘micro-blogs’ that offer users the opportunity to voice their opinions and network with other viewers (e.g. using @ replies or # topics) in real-time. Tweets were collected from 00:00am on Sunday 16th February 2014 (the date of the final airing of Benefits Street) until 23:59pm on Saturday 22nd February 2014 (totalling one calendar week worth of Twitter data).
To do this, we used the Twitter API to collect any tweets which contained in their content any of the following terms (note: the terms are not case sensitive, so terms can contain upper or lower case words without affecting data collection):
- Benefits Britain
- James Turner
- Benefits Street
This query returned 81,100 tweets which came in at a total of 1,501,938 words (tokens).
The #benefitsbritain hashtag was the most frequent token in the corpus featuring in 45,400 (3.02%) of all tweets. Channel 4 adopted the #BenefitsBritain hashtag immediately following the end of the Benefits Street programme which used the #BenefitsStreet hashtag, although this hashtag was used less (0.86%) of the time during the time in which the corpus was collected.
Several concerns are frequently expressed by users of the #BenefitsStreet hashtag. It was found that the word people is the most frequent ‘content word’ in tweets containing the #BenefitsBritain hashtag occurring in 15.2% of those tweets and occurs most frequently in the word cluster people on benefits. This cluster is associated with a number verbs including are, should, and have, which appears to be involved in ways of evaluating who people on benefits are as well as their (perceived) behaviours.
Who people on benefits are
Some appear to be challenging the stereotype that benefits claimants are workshy or lazy:
- #benefitsbritain Some people on benefits are good people who’ve gone through a bad time not everyone on benefits are scumbags.
- don’t think people should comment on things until they have been in that situation. Not all people on benefits are lazy etc!#BenefitsBritain
- #BenefitsBritain am so annoyed that that show has stigmatised all people on benefits are scum when we all aren’t IT’S SO ANNOYING!!!!!
Some argue the absolute opposite:
- #BenefitsBritain kiss my ass i think most people on benefits are lazy and need to get a damn job!!!! Cut all benefits for able bodies people
Or assume that claiming benefits is a result of a lack of skills or underlying criminality:
- Half the people on benefits are unemployable stop there benefit and they commit crime and it costs more to imprison them #BenefitsBritain
And some are somewhat more ambivalent:
- #BenefitsBritain Not all people on benefits are lazy, but if it becomes a lifestyle its dangerous territory, idle minds are the devils work.
What people on benefits do
In terms of evaluating what people on benefits do, a number users question the (perceived) behaviours of those claiming benefits:
- Fail to see why some people on benefits are allowed to spend their money on drink, cigarettes and drugs #BenefitsBritain
- watching the debate #BenefitsBritain most people on benefits have a criminal record now who wants to give them people a chance no one
Others propose possible restrictions on (perceived) social and spending behaviours:
- Why don’t people on benefits have vouchers instead of money? Then they wouldn’t spend it on drink and drugs #BenefitsBritain
- I stand by the fact that people on benefits should not have children when they cant afford to feed themselves. #BenefitsBritain
- Agree with the guy who said people on benefits should be given food stamps #BenefitsBritain
Or suggest certain behavioural conditions be fulfilled in order to claim benefits:
- People on benefits should be made to go out&do something before they get money volunteering or something!! #BenefitsBritain #BenefitsStreet
- Active people on benefits should earn their benefits through voluntary work to assist the community #BenefitsBritain
- People on benefits should only get paid if they do voluntary/training work. Then there is some progress in their lives. #BenefitsBritain
Some argue that people are workshy:
- People on benefits have lacked the ability to work hard in education there for getting a low paid job or none at all #BenefitsBritain
- #BenefitsBritain all people on benefits should get of their arse and work like the rest of us do everyday
Or have a grudge against those who work:
- What is it that some people on benefits have against working class people who’ve been successful? #benefitsdebate #BenefitsBritain
Two specific names were also frequent in tweets using the #BenefitsBritain hashtag.
The first is the host of the Benefits Britain debate, Richard Bacon. Mainly, those who spoke about Bacon brought his abilities as a host into question. One of the more creative and less direct insults being:
- Richard Bacon is a cross between Jeremy Kyle & Kilroy! @Channel4 would have been better off getting @rickedwards1 hosting #BenefitsBritain
The second person featuring frequent was (White) Dee, a prominent personality in the Benefits Street programme. Mainly, the response to her was positive. Although, there were some negative reactions:
- #BenefitsBritain always the governments fault -what nonsense Dee will never look at herself and see what a lazy scroungers she is
- My view on #BenefitsBritain Richard Bacon is a cock oh and White Dee is a sweaty lazy cow
Aside from the #BenefitsBritain hashtag, the next most frequent token in the corpus was the determiner ‘the’. The fourth most frequent token was the word ‘to’, which can be interpreted either as a preposition or as part of infinitive verbs. Looking at clusters in which to occurs revealed that in fact to occurred within a number of infinitive verb forms. I look here at the 3 most frequent: to be, to work, and to get, to see how infinitives work within the #BenefitsBritain tweet corpus and what ideas they are used to express.
The infinitive verb to be was frequently found being used in a number of interesting ways.
Users were excited that the Benefits Debate was going to be interesting:
- #BenefitsBritain this is going to be interesting!
And frequently challenged the stereotype that only poor people are drug addicts, as with this retweet:
- ‘Billionaire’s Row residents are as likely to be drug addicts as people on Benefits Street’ says MP Chris Bryant http://t.co/JG750GrJE5
When found in the cluster need to be people and work again became central to debate:
- Finally people talking about politics. Reality is we need to be paying people a living wage vote labour #benefitsstreet
- Benefits is like a Government Drug. These people need to be weaned off the drug and get a job! #BenefitsBritain #BenefitStreet
The infinitive to work not only most frequently occurs in the word cluster want to work, but is also closely associated with different ways of referring to people, either through pronouns (they, everyone, I), or the most frequent ‘content word’ in the corpus, people. As such, the formation want to work is found in tweets expressing general opinions about the desirability of work:
- Some people do want to work but it’s not as simple sick people are getting harassed to work when they are not fit #BenefitsBritain
- Majority of disabled and unemployed people want to work #BenefitsBritain #BenefitStreet
- I am so sick of hearing, make work pay, incentivize people to work. People want to work. The jobs don’t pay a living wage #benefitsbritain
Moreover, want to work is strategically used in straw man arguments against the idea that people want to work:
- These people clearly want to work? Really??? has he watched the same programme? #BenefitStreet #BenefitsBritain
And frequently collocates with the negative forms such as don’t, doesn’t in examples such as the following which express the idea that those people claiming benefits see work as undesirable:
- Let’s be real most of the people on the programme don’t really want to work anyway #BenefitsBritainIf we’re being fair…there are also A LOT of people on benefits who definitely DON’T want to work… #BenefitsDebate #BenefitsBritain
- #BenefitsStreet there is an inherent problem with some ppl in this country; they don’t want to work! Send them overseas; no benefits
- #BenefitsBritain Not all people on benefits want to work just come #skelmersdale for the next series. Wont need no editing or bribes!!
To get is the third most frequent infinitive verb formation and occurs most frequently in the phrase to get a job. Underpinning how this phrase is used is a moralised debate surrounding (un)employment which naturalises and elevates the status of employment and the employed and alienates and derides unemployment and the unemployed; having a job makes you good, having no job makes you bad. This is borne out by the data.
This includes talking about the difficulty of getting a job:
- “#BenefitsBritain makes a lot of valid points, you need experience to get a job, you need experience to get experience! Can never win!”
- #benefitsstreet #BenefitsBritain is all the fault of #thatcher who closed everything down then #cameron who makes it difficult to get a job
- #BenefitsBritain to get a job it’s not all what you know its who you know #thesystemsfucked
As well as reactions against pressure to work within a climate where work is hard to find:
- These guys on Benefits Britain thinking it’s so easy to get a job. Get back to reality you stuck up twats! #BenefitsBritain #BenefitsStreet
However, most of the uses of the to get a job phrase target jobseekers and construct them in relation to prejudices and assumptions about the (un)employed:
- Why is everyone too scared to stand up and say ‘work harder to get a job/off drugs/off drink’? #BenefitsBritain
- #benefitsstreet this show makes me so angry.. Get off your fat ass and try to get a job instead of sponging off the country
- Fuck this Benefits Street debate is making me angry. Lazy twats need to get a fucking job.
- #BenefitsBritain kiss my ass i think most people on benefits are lazy and need to get a damn job!!!! Cut all benefits for able bodies people
This data highlights a kind of moralisation of (un)employment, where ideologies underpinning this moralisation are both reinforced and challenged. The data reveals a number of apparently stable linguistic formations used to talk about unemployed benefits claimants, which appear to have revealed aspects of the ideological underpinnings of the debate.
By the ‘Metaphor in End-of-Life Care’ project team, funded by the UK’s Economic and Social Research Funding Council (ESRC):
Elena Semino, Veronika Koller, Jane Demmen, Andrew Hardie, Paul Rayson, Sheila Payne (Lancaster University) and Zsófia Demjén (Open University)
Recent media controversy over the use of social media by people with terminal illness has sparked a new debate on ‘fight’ metaphors for cancer. Writing in the New York Times on 12th January 2014 about Lisa Bonchek Adams’s blogging and tweeting, Bill Keller describes her as having ‘spent the last seven years in a fierce and very public cage fight with death’. On the one hand, Keller acknowledges that Bonchek Adams’s ‘decision to treat her terminal disease as a military campaign has worked for her’. On the other hand, he favourably compares his own father-in-law’s ‘calm death’ with what he describes as Bonchek-Adams’s choice to be ‘constantly engaged in battlefield strategy with her medical team’.
As part of the ESRC-funded project ‘Metaphor in End-of-Life Care’ at Lancaster University, we are studying the use of ‘fight’ metaphors by cancer patients in a large collection of interviews and online fora. We have found plenty of evidence of the negative sides of these metaphors, which have been criticised by many patients and commentators before Keller, and most famously by Susan Sontag in Illness as Metaphor (1979). Seeing illness as a fight can make people feel inadequate and responsible if they do not get better, as when a patient in our data writes: ‘I feel such a failure that I am not winning this battle’. Military metaphors can also express distressing ways of perceiving oneself, such as when some patients describe themselves as ‘time bombs’ during periods of remission.
On the other hand, we are finding that, for some patients, ‘fight’ metaphors do seem to provide meaning, purpose and a positive sense of self. For example, writing in an online forum, a cancer sufferer proudly comments: ‘my consultants recognised that I was a born fighter’. Another patient says in an interview: ‘I don’t intend to give up; I don’t intend to give in. No I want to fight it. I don’t want it to beat me, I want to beat it. Because I don’t think we should give up trying.’ ‘Fight’ metaphors are also used to give and receive encouragement and solidarity. For example, a patient writes ‘let me hear you scream the battle cry to spur us on to win this war’, while another ends an online forum post with the words ‘Soldier on everybody’.
We would not go as far as to argue that ‘fight’ metaphors should be rehabilitated: they can do real harm, and nobody should ever feel under pressure to see themselves as fighters. However, as with most metaphors, the implications of ‘fight’ metaphors change depending on who uses them, why, where and how. Our data suggest that they can be helpful enough to be recognised and accepted as one of many possible ways of approaching illness, including its terminal phase.
The Economic and Social Research Council (ESRC) is the UK’s largest organisation for funding research on economic and social issues. It supports independent, high quality research which has an impact on business, the public sector and the third sector. The ESRC’s total budget for 2013/14 is £212 million. At any one time the ESRC supports over 4,000 researchers and postgraduate students in academic institutions and independent research institutes.
by Love, R., McEnery, T. & Wattam, S.
The ESRC-funded Centre for Corpus Approaches to Social Science (CASS) at Lancaster University has undertaken some preliminary research into the immediate reaction on Twitter to the sentencing of the Lee Rigby murderers on Wednesday 26th February 2014. This document summarises our findings.
On the afternoon of Wednesday 22nd May 2013, British soldier Lee Rigby was murdered by two men, Michael Adebolajo and Michael Adebowale, near the Royal Artillery Barracks in Woolwich, London. The attack, which was carried out in broad daylight, quickly became a major national news story. In December 2013 the perpetrators were found guilty of murder and were sentenced on Wednesday 26th February 2014. Adebolajo received a whole-life sentence (meaning he will never be released) and Adebowale received a life sentence with a minimum term of 45 years imprisonment.
How the research was carried out
We carried out our research by using the Twitter API to collect a large amount of tweets that referred to the Rigby case, in some way, between 00.00 and 23.59 on Wednesday 26th February 2014. All tweets containing one or more of the following terms were included in our search:
rigby, adebolajo, adebowale, woolwich trial, woolwich sentence, woolwich sentencing, justice Sweeney, #leerigby, #rigbytrial, #rigbysentence, #woolwich, #woolwichmurder, #woolwichattack, #woolwichtrial
Using these search terms we collected a total of 57,097 tweets over the 24 hour period, which included retweets (RTs), quotes etc. This amounted to a total of 1,109,136 words of Twitter discussion about the case. We then used a set of tools and methods developed in corpus linguistics to find out the ways in which Twitter users discussed the sentencing on the day of the decision.
The following is a selection of preliminary findings based on the analysis of the tweets.
- Nearly two thirds of the tweets were retweets
Nearly 35,000 tweets (60.1% of tweets) included the retweet abbreviation RT. This confirms that Twitter discussion of the Lee Rigby case was highly retweeted and shared by Twitter users. The top ten most frequently retweeted Twitter handles appear to have been:
|1||@bbcbreaking||Breaking news account for BBC News|
|2||@skymarkwhite||Home Affairs Correspondent for Sky News|
|3||@skynewsbreak||Breaking news account for Sky News|
|4||@poppypride1||An “independent account supporting all troop charities”|
|5||@jakeleonardx||Young footballer at Crewe Alexandra Academy|
|6||@itvnews||Main account for ITV News|
|7||@courtnewsuk||News reports account for the Old Bailey|
|8||@thesunnewspaper||Main account for The Sun newspaper|
|9||@bbcnews||Main account for BBC News|
Based on these it seems that the most popular form of Twitter interaction relating to the Rigby sentencing was to retweet news updates from well-known news providers including the BBC News, Sky News, ITV News and The Sun. @jakeleonardx is not a celebrity (he has less than 1,000 followers), but when he tweeted a photo of Lee Rigby’s son with the caption “Poor little lad, RIP Lee Rigby”, it was retweeted nearly 1,000 times. @unnamedinsider appears to be better known (with over 34,000 followers), and posted two tweets ridiculing the BNP and EDL protesters who had gathered outside of the Old Bailey for the sentencing.
- The most salient word (apart from names and Twitter terms) was life
Twitter users were very concerned with the nature of the sentence being delivered in the sentencing, using the word ‘life’ 19,498 times (34.1% of tweets). The most common three-word phrase this was used in was life in prison (4,369 times, 7.7% of tweets), confirming that Twitter users were not concerned about the loss of life but rather the restriction of those of the perpetrators.
- Some Twitter users wanted more than whole-life terms for the perpetrators
As well as whole-life terms, Twitter users strongly expressed their opinion about other punishments they deemed suitable for the perpetrators. In particular, highly salient words like rot, deserve, should and hang indicate this. The most popular three-word expression relating to such desired punishments is rot in hell. Furthermore the word deserve occurred 1,295 times (2.3% of tweets), an indication of a clear evaluation of the sanction proposed: popular four-word phrases containing deserve included deserve a life sentence, deserve to be hung, and deserve the death penalty. Likewise the word should is almost exclusively used to wish death upon the perpetrators of the murder, while hang relates to the most popular way in which Twitter users wanted capital punishment to be undertaken upon the killers.
- Michael Adebolajo was discussed more than Michael Adebowale
The surname ‘Adebolajo’ was tweeted 15,092 times (26.4% of tweets) compared to ‘Adebowale’ being tweeted only 11,729 times (20.5% of tweets). This indicates that the perpetrator, who received the whole-life sentence was of more concern for tweeters than the perpetrator who received the less severe punishment.
- The most salient word used to describe Adebolajo and Adebowale was scum, and the most salient swear word was cunts
Twitter’s word of choice for the perpetrators was scum, which occurred 1,466 times (2.6% of tweets). Popular phrases included ‘the scum’, ‘this scum’, ‘two scum’, ‘them scum’ and ‘those scum’, and popular words that combined with scum include absolute, fucking, murdering and jihadi. Furthermore, the swear word cunts was used 800 times in tweets about the Rigby sentencing (1.4% of tweets). This further indicates that, as expected, there was considerable disapproval and anger expressed towards the perpetrators. Words that combined with cunts to describe the perpetrators included dirty, sick, horrible, fucking, evil, scummy, vile, muslim, murdering and filthy.
- In terms of religion, Twitter users were most concerned about Islam
The three most salient religious words were islamistas, Islam and Muslim. Islamistas (Spanish for Islamists) occurred in Spanish language tweets reporting the result of the sentencing (though most tweets were produced in English, and by users from the UK, there appears to have been activity from all over the world). The other terms mostly occur in retweets and discussions about the judge’s statement that the perpetrators had betrayed Islam by murdering Rigby. The general opinion appears to be that the murder was nothing to do with the religion of Islam.
This preliminary analysis, using tools and methods from corpus linguistics, has captured a general impression of the Twitter reaction to the sentencing of the Lee Rigby murderers. It seems that the main reaction centred around the nature of the sentencing and the Twitter users’ wishes for both Michael Adebolajo and Michael Adebowale to receive at least a whole-life sentence but preferably death. Furthermore some Twitter users appeared unrestrained in their willingness to use offensive language to describe the killers.
 As many as possible were collected, but given the immediacy of the event and the nature of the search method, we acknowledge that Twitter users may have tweeted about the Rigby trial without using any of these terms.
 This may have been even higher than this if we take into account retweets that do not contain the letters ‘RT’.
The Changing Climates project is a corpus-based investigation of discourses around climate change. It aims to examine how climate change has been framed in the media coverage across Britain and Brazil in the past decade. Here, we look at two different scenarios. Recent surveys have shown that climate change is currently considered a high priority concern within Brazil, with the country showing higher degree of concern than almost anywhere else. By contrast, climate change scepticism is increasingly prominent in the British public sphere.
We are pleased to announce that we have just finished collecting the data. The Brazilian corpus contains about 8 million words, comprising texts from 12 newspapers. The British corpus is much larger. It has nearly 80 million words and includes texts published by all major British broadsheet and tabloid papers.
The CASS-affiliated Metaphor in End of Life Care project has just released a free resource containing information of interest to many of our readers. Download the document now to learn more about the project, from basic concepts (what is metaphor, and how are they used in everyday life?) to more specific details (why study metaphor in end-of-life care?). Some interesting initial findings are also included. For instance, “Family carers often say that their emotions can only be safely ‘released’ when talking to people who are ‘in the same boat’.” Read on to learn more about the project.
I wrote UCAG during a sabbatical as a semi-sequel to a book I published in 2006 called Using Corpora for Discourse Analysis. Part of the reason for the second book was to update and expand some of my thinking around discourse- or social-related corpus linguistics. As time has passed, I haven’t become disenamoured of corpus methods, but I have become more reflective and critical of them and I wanted to use the book to highlight what they can and can’t do, and how researchers need to be guarded against using tools which might send them down a particular analytical path with a set of pre-ordained answers. Part of this has involved reflecting on how interpretations and explanations of corpus findings often need to come from outside the texts themselves (one of the tenets of critical discourse analysis), and subsequently whether a corpus approach requires analysts to go further and critically evaluate their findings in terms of “who benefits”.
Another way in which my thinking around corpus linguistics has developed since 2006 is in considering the advantages of methodological triangulation (or approaching a research project in multiple ways). In one analysis chapter I take three small corpora of adverts from Craigslist and try out three methods of attempting to uncover something interesting about gender from them – one very broad involving an automated tagging of every word, one semi-automatic relying on a focus on a smaller set of words, and another much more qualitative, relying on looking at concordance lines only. In another chapter I look at “difficult” search terms – comparing two methods of finding all the cases where a lecturer indicates that a student has given an incorrect answer in a corpus of academic-related speech. Would it be better to just read the whole corpus from start to finish, or is it possible to devise search terms so concordancing would elicit pretty much the same set?
The book also gave me a chance to revisit older data, particularly a set of newspaper articles about gay people from the Daily Mail which I had first looked at in Public Discourses of Gay Men (2005). As a replication experiment I revisited that data and redid an analysis I had first carried out about 10 years ago. While the idea of an objective researcher is fictional, corpus methods have aimed to redress the issue of researcher bias to an extent – although in retreading my steps, I did not obtain exactly the same results. Fortunately, the overall outcome was the same, but there were a few important points that the 10 years younger version of me missed. Does that matter? I suspect it doesn’t invalidate the analysis although it is a useful reminder about how our own analytical abilities alter over time.
Part of the reason for writing the book was to address other researchers who are either from corpus linguistics and want to look at gender, or who do research in gender and want to use corpus methods. I sometimes feel that these two groups of people do not talk to each other very much and as a result the corpus research in this area is often based around the “gender differences” paradigm where the focus is on how men and women apparently differ from each other in language use (with attendant metaphors about Mars and Venus). Chapters 2 and to an extent 3, address this by trying a number of experiments to see just how much lexical variation there is in sets of spoken corpora of male and female language – and when difference is found, how can it be explained? I also warn against lumping all men together into a box to compare them with all women who are put in a second box. The variation within the boxes can actually be the more interesting story to tell and this is where corpus tools around dispersion can really come into their own. So even if, for example, men do swear more than women, it’s not all men and not all the time. On the other hand, some differences which are more consistent and widespread can be incredibly revealing, although not in ways you might think – chapter 2 took me down an analytical path that ended up at the word Christmas – not perhaps an especially interesting word relating to gender, but it produced a lovely punchline to the chapter.
It was also good to introduce different corpora, tools and techniques that weren’t available in 2006. Mark Davies has an amazing set of online corpora, mostly based around American English, and I took the opportunity to use the COHA (Corpus of Historical American English) to track changes in language which reflects male bias over time, from the start of the 19th century to the present day. Another chapter utilises Adam Kilgariff’s online tool Sketch Engine which allows collocates to be calculated in terms of their grammatical relationships to one another. This allowed for a comparison of the terms boy and girl which allowed me to consider verbs that positioned either as subject or object. So girls are more likely to be impressed while boys are more likely to be outperformed. On the other hand boys cry whereas girls scream.
It would be great if the book inspired other researchers to consider the potential of using corpora in discourse/social related subjects as well as showing how this potential has expanded in recent years. It’s been fun to explore a relatively unexplored field (or rather travel a route between two connecting fields) but it occasionally gets lonely. I hope to encounter a few more people heading in the same direction as me in the coming years.
Tuesday 7th January saw John Nimmo and Isabella Sorley plead guilty to sending messages “menacing” in nature to Feminist campaigner Caroline Criado-Perez and Walthamstow MP Stella Creasy via multiple Twitter accounts.
In July 2013, Criado-Perez had been successful in campaigning for author Jane Austen to appear on the £10 bank note. Shortly after in final days of July and spilling into August, a torrent of abuse was directed at Criado-Perez including numerous threats to sexually abuse, rape, torture, and kill the campaigner. After lending Criado-Perez support on the social networking site, Creasy was also targeted by abusive users.
The prosecution identified abusive traffic from 86 different Twitter accounts, several of which belonged to the defendants.
The court heard from prosecutor Alison Morgan that Criado-Perez felt “significant fear” due to the menacing nature of the tweets which have had “life changing psychological effects”, Creasy reported that both her personal and professional life were impacted upon by the messages.
Sorley held her face in her hands as the prosecutor read aloud some of her offending tweets, which included;
“You’re wasting shits loads of time because you can’t handle rape threats, pathetic! Rape is the last of your worries!!!!”
“rape?! I’d do a lot worse things than rape you!!”
“I will find you and you don’t want to know what I will do when I do, you’re pathetic, kill yourself beforeI i do #godie”
When arrested in October of 2013, Sorley admitted to sending the abusive tweets, saying that she was “bored” and that “I was off my face on drink” at the time, although she accepted that some tweets could be perceived as death threats.
Nimmo, on the other hand was arrested in July of 2013 after having been tracked down by a Newsnight reporter and gave no comment when arrested. His defence claimed that he is a “social recluse” whose “social interaction, social life, is online” as a result of being “systematically bullied at secondary school, both physical and verbal”. As a result of social exclusion, his defence claims, Nimmo has “no social life, no friends, he strives for popularity” and that his “outrageous comments [were] made for retweets”.
Both Sorley and Nimmo plead guilty under Section 127 of the Communications Act (2003) and are to appear before the Westminster Magistrates court later this month.
I travelled to the court to witness the trial as part of work being undertaken as part of a research project on Discourse of Online Misogyny (DOOM) here at CASS. Our initial aim is to investigate the ways in which language was used as part of the threats made against Caroline Criado-Perez and Stella Creasy on Twitter. Building on this, we will produce sophisticated analytical tools to provide critical analyses of language and other kinds of behaviours which emerge during instances of online abuse (such as network building).
Claire Hardaker, Lecturer in English Language and Principal Investigator of the DOOM project, appeared on the 07/01/2014 edition of Newsnight.
You can also read a summary of this work in a complimentary CASS: Briefing.
A forthcoming special issue of Corpus Linguistics and Linguistics Theory, which is guest-edited by Dr Richard Xiao and Professor Naixing Wei, President of the Corpus Linguistics Society of China, is now available online as Ahead of Print at the journal website.
This special issue focuses on corpus-based translation and contrastive linguistic studies involving two genetically different languages, namely English and Chinese, which we believe have formed an important interface with its unique features as a result of the mutual interaction between the two languages.
Corpora have tremendously benefited translation and contrastive studies, and in the meantime, corpus-based translation and contrastive linguistic studies have also significantly expanded the scope of corpus linguistic research. While contrastive linguistics and translation studies have traditionally been accepted as two separate disciplines within applied linguistics, there are many contact points between the two; and with the common corpus-based approach and the usually shared type of data (e.g. comparable and parallel corpora), corpus-based translation and contrastive linguistic studies have become even more closely interconnected, as demonstrated by the articles included in this special issue.
This special issue of Corpus Linguistics and Linguistics Theory includes five research articles together with an extensive introduction written by the guest editors.
- Translation and contrastive linguistic studies at the interface of English and Chinese: Significance and implications, by Richard Xiao and Naixing Wei
- Lexical and grammatical properties of Translational Chinese: Translation universal hypotheses reevaluated from the Chinese perspective, By Richard Xiao and Guangrong Dai
- What is peculiar to translational Mandarin Chinese? A corpus-based study of Chinese constructions’ load capacity, by Kefei Wang and Hongwu Qin
- Structural and semantic non-correspondences between Chinese splittable compounds and their English translations: A Chinese-English parallel corpus-based study, by Jiajin Xu and Xiaochen Li
- Exploring semantic preference and semantic prosody across English and Chinese: Their roles for cross-linguistic equivalence, by Naixing Wei and Xiaohong Li
- A corpus-based variationist approach to bei passives in Mandarin Chinese, by Hongjie Guo and Daryl Chow
These studies combine contrastive analysis and translation studies on the basis of comparable corpora (either multilingual or monolingual) and parallel corpora of English and Chinese, two most widely spoken world languages that differ genetically. While the decision to involve English and Chinese in the research reported in this volume was largely based on the authors’ strong languages (they are all competently bilingual in Chinese and English), the significance of the typological distance between the two languages covered in these studies cannot be underestimated. In comparison with studies of typologically related languages, translation and cross-linguistic studies of genetically distant languages such as English and Chinese can have more important theoretical implications for linguistic theorization. Studying such language pairs help us gain a better appreciation of the scale of variability in the human language system while theories and observations based on closely related language pairs can give rise to conclusions which seem certain but which, when studied in the context of a language pair such as English and Chinese, become not merely problematized afresh, but signiﬁcantly more challenging to resolve (cf. Xiao and McEnery 2010).
Studies reported on in this special issue embody features at the interface of English and Chinese, which can be expected to have important significance and practical implications for linguistic theorizing.