#CL2015 social media roundup: Using Corpus Linguistics to investigate Corpus Linguists talking about Corpus Linguistics


Corpus Linguistics 2015 – CL2015 – is the largest conference of its kind and this year drew over 250 attendees from all over the world to present work outlining the state of Corpus Linguistics (CL) at large, leading-edge technology and methods, and setting the agenda for years to come.

Of particular interest to me was a small but important streak of enquiry running through the conference, which is also becoming more prevalent in CL as a whole. That is, a focus on corpora collected from online source such as blogs and social media (Elgesem & Salway 2015; Grieve, et al. 2015; Hardaker & McGlashan 2015; Knight 2015; Longhi & Wigham 2015; McGlashan & Hardaker 2015; Statache, et al. 2015). The Internet now enables great opportunities for the collection and interrogation of large amounts of data – big data, even – and the rapid compilation of specialised corpora in ways previously impossible.

I focus here on social media data, specifically data collected from Twitter. Sampling data from Twitter, like a lot of other online sources, offers the opportunity to collect what people are saying (the content of their posts; tweets) but also a huge amount of metadata about the date, time, user, shared content (e.g. hyperlinks, retweets), interactional information, etc. relating to those posts. As Corpus Linguists, we therefore get the data we sample for – posts containing the thing(s) we are interested in – as well as other social information about the content creators and their social networks that we may or may not be interested in. Indeed, concerns about the kinds of metadata included and attached to online post is an issue that has sparked a great deal of debate about the ethics of collecting and using publicly posted online content, though these concerns are not discussed here. Instead, the potential for online ethnography is explored. In order to do this, I pair familiar CL research methods with methods from Social Network Analysis (SNA) that are more explicitly focussed on social networks and examining the myriad ways people affiliate with each other.

Theory & Methods: Corpus-assisted Community Analysis (CoCoA)

Corpus-assisted Community Analysis (CoCoA) is a multimethodological approach to the study of online discourse communities that combines methods from Discourse Analysis (DA), CL, and SNA.

Corpus-assisted Discourse Analysis

I predominantly draw on Baker (2006) in my approach to corpus-assisted DA, seeing discourse in a Foucauldian sense as, forms of social practice; “practices which systematically form the objects of which they speak” (Foucault 1972: 49). Particularly, I am interested in the incremental effect of discourse. Baker suggests, “a single word, phrase or grammatical construction on its own may suggest the existence of a discourse” (2006: 13). However, in order to investigate how quantitatively typical or pervasive discourse is within a discourse community, numerous examples of linguistic instantiations of discourse are required to make a claim about its cumulative effect (ibid.). Following Baker, I argue here that corpora and CL techniques enable this kind of quantitative examination of discourse.

Social Network Analysis

SNA implements notions from graph theory for the formal modelling and describing the properties of relationships between objects of study such as people and institutions. A graph (or ‘sociogram’) is a representation of people or institutions of interest as ‘nodes’ and the relationships between them as a set of lines known as ‘edges’; a graph is built by representing “a set of lines [‘edges’] connecting points [‘nodes’]” (Scott 2013: 17). To interpret graphs, graph theory contributes “a body of mathematical axioms and formulae that describe the properties of the patterns formed by the lines [‘edges’]” (Scott 2013: 17). One of these axioms is ‘directionality’. Directed graphs can encode both symmetric and asymmetric relations (D’Andrea, et al. 2010: 12). Directed relationships are where nodes are connected by an edge that has a direction of flow from one node to another is known as asymmetric, as illustrated by the relations between A and C, and C and B in Fig. 1. Symmetric relationships are those in which an edge connects two nodes but is bidirectional – the direction of relation flows both ways – as illustrated by the relationship between A and B in Fig. 1. Directed relationships on Twitter include followership relations and the act of mentioning – i.e. including the handle (e.g. @CorpusSocialSci) – in tweets.


Figure 1: A simple directed graph

Undirected graphs represent identical, symmetric relationships between nodes which might be the result of nodes sharing reciprocal attitudes or “because they have a common involvement in the same activity” (Scott 2013: 17). Fig. 2 contains gives a graphical representation of an undirected graph.


Figure 2: A simple undirected graph

Directed and undirected (‘ambient’) kinds of affiliation are both understood here as being distinct forms of discursively constructed social practices. Furthermore, I adopt the term ‘ambient affiliation’ from the work of Zappavigna on the use of social media in the formation of community and identity (Zappagigna 2012; Zappagigna 2013). Ambient affiliation is about the functionalities of social media platforms that enable users “to commune with others without necessarily engaging in direct conversational exchanges” (Zappagigna 2013: 223-4). Therefore, ambient affiliation is about people exhibiting the same behaviours or sharing the same qualities but without directly interacting with each other. This notion closely approximates to the notion of an ‘undirected’ graph. In developing the theory of ambient affiliation Zappavigna draws on Page’s work on hashtags. Page refers to hashtags as “a search term” (2012: 183). Hashtags – a string of characters (usually a word or short phrase) unbroken by spaces or non-alphabetic/non-numeric characters (excl. underscores ‘_’) preceded by ‘#’ (e.g. #YOLO) – are used a metadiscursive markers of the topic of a tweet. Page goes onto argue that, “the kind of talk which aggregate around hashtags […] involve multiple participants talking simultaneously about the same topic, rather than individuals necessarily talking with each other in dyadic exchanges that resemble a conversation” (2012: 196). As such, Page suggests that hashtags destabilise conventional adjacency pairs characteristic of many forms of human dialogue and give a new way for humans to interact on a topic of mutual interest.


I collected all tweets and retweets including the official hashtag of the Corpus Linguistics 2015 conference – #CL2015 – posted from the date of the first pre-conference workshop (20/07/2015) through until the final day of the conference (24/07/2015). To do this, I used the R based Twitter client ‘twitteR’ to access the Twitter API. The resulting data amounted to:

  Total number
Tweets 671
Retweets 1025


  Tweets Retweets
20/07/2015 57 76
21/07/2015 128 169
22/07/2015 152 370
23/07/2015 176 241
24/07/2015 158 169
Totals 671 1025

The tweets corpus contained around ~10,000 words in total.


The data contained some ‘noise’ mainly caused by other people using the same #CL2015 hashtag to talk about another event occurring during the period of the conference. However, as I will show in the analysis, the methods enable researchers to focus only on the communities they are interested in.


Tweets – what was being talked about?

To find out what people were talking about day-to-day, I created daily tweet corpora. With each of these daily corpora, I performed a keyword analysis using a reference corpus compiled using the remaining other days. So, for the tweets sent during the pre-conference workshop day (20/07/2015) I used the tweets sent during the rest of the conference (21/07/2015-24/07/2015) as a reference corpus, and so on. The resulting top 10 keywords for each day are given in the table below.

  20/07/2015 21/07/2015 22/07/2015 23/07/2015 24/07/2015
1 CL2015 change sealey partington illness
2 workshop fireant granger duguid mental
3 pre biber animals gala news
4 workshops climate sylviane class literature
5 conference doom campaign dinner yahoo
6 main misogyny heforshe please dickens
7 starting academic collocation poster csr
8 historian assist eeg legal health
9 day biber’s handford alan incelli
10 lancaster bnc learner mock jaworska

The keywords shown in each column outline the most distinctive topics tweeted about during the conference. Italics used here relate back to keywords in the table.

On day 1, the pre-conference workshops, including @antlab‘s pre-conference corpus tools brainstorming session and @stgries’s pre-conference #R workshop were popular topics of conversation in the smallest subsample of tweets for the week.

Top favourited tweet from day 1:

On day 2, more diverse topics start to emerge. Change became a theme, relating to Andrew Salway’s talk on discourse surrounding climate change but also relates to a talk given by Doug Biber on historical linguistic change in ‘uptight’ academic texts. Fireant, a new user-friendly tool for efficiently dealing with large databases developed by Laurence Anthony, was also unveiled to the CL masses on day 2, which prompted a flurry of excited tweets [keep track of Laruence’s Twitter page for release]. DOOM and misogyny also became topical following talks by Claire Hardaker and Mark McGlashan on the Discourse of Online Misogyny project. Finally, some excitement followed a paper given by Robbie Love and Claire Dembry about the new Spoken BNC2014. For those interested, keep track of the CASS website for spoken data grants later in the year.

Top favourited tweet from day 2:

 Day 3 saw another topic change focussing most prominently on Alison Sealey’s talk on the discursive construction of animals in the media, Sylviane Granger’s plenary on learner corpora, a talk on the public’s online reactin to the #HeForShe campaign given by Rosie Knight, and Jen Hughes’ talk on the application of EEG (‘Electroencephalography’) to the study of collocation as a cognitive phenomenon.

Top favourited tweet from day 3:

After 3 days of incredibly interesting talks, corpus linguists were about ready for their gala dinner on day 4. But before all the cheesecake, the CL2015 were excitedly tweeting about the all important poster session, Alison Duguid’s talk on class, the Geoffrey Leech tribute panel which included Charlotte Taylor’s paper on mock politeness and ‘bitchiness’ as well as Lynne Murphy and Rachele de Felice’s talk on the differential use of please in BrE and AmE, Alan Partington’s plenary speech on CADS; and papers given by Ruth Breeze, Amanda Potts, and Alex Trklja, on the application of CL methods to the study of a broad range of legal language.

Top favourited tweet from day 4:

Day 5 brought #CL2015 to a close but the number of tweets remained steady with health on the agenda with talks from Ersilia Incelli and Gillian Smith who both focussed partly on the construction of mental illness/health in the news. News also featured Monika Bednarek’s talk on news discourse and Antonio Fruttaldo’s analysis of news tickers. Other key topics related to Sylvia Jaworska and Anupam Nanda’s paper on the Corpus Linguistic analysis of Corporate Social Responsibility (CSR), Michaela Mahlberg’s work on the literature of Charles Dickens, and discussion of a corpus of Yahoo answers in the week’s penultimate panel on triangulating methodological approaches.

Top favourited tweet from day 5:

Approaching tweets in this way, it was possible to find out the most salient topics of each day. However, I was also interested in the retweeting behaviour of attendees.

Retweets – what was being talked about?

I looked at the top 10 most frequently retweeted tweets during the conference. Due to the intertextual nature of retweets – they are simply identical reposts of the same content – methods familiar to CL such as word frequency lists may not be as useful in their study. For example, if a few retweets are particularly frequently reposted, the most frequent words will be skewed by the content of the most frequent retweets. Instead, I suggest that retweets themselves should be conceptualised as being individual types in and of themselves that require more qualitative approaches to their interpretations (at least in this context). The top 10 most frequently retweeted tweets including the #CL2015 hashtag are given below:

  Retweet Date Freq
1 RT @EstrategiasEc: Concluimos este viernes con exitoso proceso de postulación @ECLideres VI Prom. #CL2015 con auspicio de @ucatolicagye. ht… 22/07/2015 218
2 RT @perayson: To access the new HT semantic tagger from the @SAMUELSProject see http://t.co/5LFWH8YGAH and http://t.co/BPxcC8pNNK #CL2015 23/07/2015 15
3 RT @UCREL_Lancaster: The #CL2015 abstract book is now available to download from the conference website http://t.co/px9hh3mMNe 21/07/2015 13
4 RT @duygucandarli: Important take-away messages about corpus research in Biber’s plenary talk at #CL2015! http://t.co/xm87Uo1umZ 21/07/2015 11
5 RT @lynneguist: Alan Partington looking at how quickly language changes in White House Press Briefings… #CL2015 http://t.co/jeVjvC8Ym3 23/07/2015 10
6 RT @CorpusSocialSci: .@_paulbaker_ reflecting on a number of approaches to the same data at the Triangulation panel at #CL2015 http://t.co/… 24/07/2015 10
7 RT @CorpusSocialSci: .@vaclavbrezina introduces Graphcoll, a new visualisation tool for collocational networks #CL2015 http://t.co/PM5FxS5N… 22/07/2015 9
8 RT @_ctaylor_: It’s a myth that reference corpora have to larger than target corpus says @antlabjp  #cl2015 22/07/2015 7
9 RT @Loopy63: #CL2015 Call for papers for Intl. Conference on  statistical analysis of textual data 2016 in Nice, France: http://t.co/3JpcAa… 23/07/2015 7
10 RT @vaclavbrezina: A great use of #GraphColl by @violawiegand – #CL2015 poster presentation @TonyMcEnery @StephenWattam http://t.co/uwlMGUY… 24/07/2015 7

The most frequent retweet was regarding a Latin American Youth Leadership programme that shared the same #CL2015 hashtag [nb. For next year, Corpus Linguistics conference organisers…]. As you will notice, this retweet occurred on 22/07/2015 but as retweets and tweets are dealt with exclusively, the retweet does not interfere with the keyword analysis done for the same day on the tweets.

What do the most frequent retweets highlight? Free tools (GraphColl, HT semantic tagger), free resources (abstract book), plenary talks and more conferences.


With a general idea of what people are talking about and sharing using the #CL2015 hashtag, I was interested to examine the overall activity around #CL2015 and the emergence of discourse communities.

In terms of tweets the gif below shows how relationships developed over the course of the conference. Every node represents a Twitter account that posted a tweet containing #CL2015 during the period of data collection. The size of these nodes is dictated by their ‘degree’, or its number of edges. More edges = larger node. The colour of the nodes is determined by ‘betweenness centrality’, which indicates how central a node is in a network. Nodes with high betweenness centrality help the speed of transfer of information through networks as they help create the shortest distance between other nodes in the network. Nodes with high betweenness centrality are coloured red, a medium betweenness centrality is yellow, and low betweenness centrality is blue. Nodes with intermediary colours (orange, green) represent those that have a betweenness centrality somewhere between low and medium or between medium and high. Finally, the colour and size of edges is dictated by ‘weight’. In this example, weight is dictated by the frequency of tweets that exist between nodes. Thick red edges between nodes represent nodes that send tweets to each other frequently, or one node mentions another frequently. Thin blue edges represent low frequency mentioning relationships. Yellow are medium. Again, blended colours represent intermediary frequencies and thus, in this case, weight.

CL2015 tweets

The tweets network shows that @CorpusSocialSci was – perhaps unsurprisingly – the most prolific and central account in the #CL2015 network. It had the most connections and joined the most individual accounts together. But other users were very active in helping to disseminate information more widely, which are shown by those nodes in yellow and orange. The accounts on the periphery of the network are good examples of ambient affiliation. They use #CL2015 to affiliate but do not directly engage with others by mentioning other users. Moreover, the gif attempts to show the evolution and growth of the network over time but also shows that each day new topics and networks of interaction relating to those new topics emerged daily. As talks (and news of talks in the network) became topical, people tweeted and shared ideas and notes relevant to those talks. An example of this is the emergence of fireant on 21/07/2015. When introduced to delegates, an ad hoc online discourse community formed to spread the news of a new tool, add new information and to channel their enthusiasm back to source.

User Date Tweet
RachelleVessey 2015-07-21 16:50:43 Excellent end to the first day of #CL2015- FireAnt looks like a fantastic programme @antlabjp @DrClaireH can’t wait to try it out!
SLGlaas 2015-07-21 16:54:13 Stupidly excited about #Fireant from @antlabjp  #CL2015
CorpusSocialSci 2015-07-21 16:54:43 Everyone is eagerly wondering when FireAnt will be available. @antlabjp’s answer is hopefully within the next few months. #CL2015
Rosie_Knight 2015-07-21 16:56:40 Amazing talk about FireAnt- can’t wait to use this on my #HeForShe data! @DrClaireH @antlabjp @Mark_McGlashan #CL2015

CL2015 retweets

The retweets network again shows that @CorpusSocialSci was – and, again perhaps unsurprisingly – at the centre of #CL2015 retweeting activity. The retweet network gif shows 2 discrete networks. The right hand network shows activity at the CL conference, the left hand network shows the retweeting behaviour of the Latin American Youth Leadership programme mentioned above. Avid conference tweeters may have noticed when keeping track of the #CL2015 hashtag. The left hand network – a graphic representation of the most retweeted tweet containing #CL2015 shown above – shows 218 users retweeting a single central account. In this network there is no interaction between the users engaged in retweeting this user. This kind of network formation is extremely typical of users retweeting news stories on Twitter. The right hand network, however, shows a great deal of mutual retweeting, whereby users are engaged on a prolonged basis in sharing each others’ tweets and forming a network of sharing and resharing.


Integrating methods from CL and SNA offers some really interesting possibilities for the analysis of large amounts of social data. Here, I have used keyword analysis to find the most salient topics for each day of the conference, used those topics to find and visualise small but coherent discourse communities, and situated those communities within the wider #CL2015 social network.





Baker, P. (2006) Using Corpora in Discourse Analysis. London: Continuum.

D’Andrea, A., Ferri, F. & Grifoni, P. (2010). An overview of methods for virtual social network analysis. In: A. Abraham, A.-E. Hassanien, & V. Sná el (eds.). Computational Social Network Analysis. London: Springer London, pp. 3–26.

Elgesem, D. & Salway, A. (2015) Traitor, whistleblower or hero? Moral evaluations of the Snowden affair in the blogosphere. In Formato, F. & Hardie, A. (Eds.) Corpus Linguistics 2015 Abstract Book. Paper presented at Corpus Linguistics 2015, Lancaster. Lancaster University. pp 99-101

Foucault, M. (1972) The Archaeology of Knowledge. London: Tavistock.

Grieve, J., Nini, A., Guo, D, & Kasakoff, A. (2015) Recent changes in word formation strategies in American social media. In Formato, F. & Hardie, A. (Eds.) Corpus Linguistics 2015 Abstract Book. Paper presented at Corpus Linguistics 2015, Lancaster. Lancaster University. pp 140-3

Knight, R. (2015) Tweet all about it: public views on the UN’s HeForShe campaign. In Formato, F. & Hardie, A. (Eds.) Corpus Linguistics 2015 Abstract Book. Paper presented at Corpus Linguistics 2015, Lancaster. Lancaster University. pp 201-3

Longhi, J. & Wigham, C. R. (2015) Structuring a CMC corpus of political tweets in TEI: corpus features, ethics and workflow. In Formato, F. & Hardie, A. (Eds.) Corpus Linguistics 2015 Abstract Book. Paper presented at Corpus Linguistics 2015, Lancaster. Lancaster University. pp. 408-9

Hardaker, C. & McGlashan, M. (2015) Twitter rape threats and the discourse of online misogyny (DOOM): from discourses to networks. In Formato, F. & Hardie, A. (Eds.) Corpus Linguistics 2015 Abstract Book. Paper presented at Corpus Linguistics 2015, Lancaster. Lancaster University. pp. 154-6

McGlashan, M. & Hardaker, C. (2015) Twitter rape threats and the discourse of online misogyny (DOOM): using corpus-assisted community analysis (COCOA) to detect abusive online discourse communities. In Formato, F. & Hardie, A. (Eds.) Corpus Linguistics 2015 Abstract Book. Paper presented at Corpus Linguistics 2015, Lancaster. Lancaster University. pp. 234-5

Page, R. (2012). The linguistics of self-branding and micro-celebrity in Twitter: The role of hashtags. Discourse & Communication. 6 (2). p.pp. 181–201.

Scott, J. (2013). Social Network Analysis. 3rd Ed. London: Sage.

Statache, R., Adolphs, S., Carter, C. J., Koene, A., McAuley, D., O’Malley, C., Perez, E. & Rodden, T. (2015) Descriptive ethics on social media from the perspective of ideology as defined within systemic functional linguistics. In Formato, F. & Hardie, A. (Eds.) Corpus Linguistics 2015 Abstract Book. Paper presented at Corpus Linguistics 2015, Lancaster. Lancaster University. p. 433

Zappavigna, M. (2012). Discourse of Twitter and social media. London: Continuum.

Zappavigna, M. (2013). Enacting identity in microblogging through ambient affiliation. Discourse & Communication. 8 (2). pp. 209–228.

New CASS Partnership to Work on Mapping Online Far-Right Networks

Research staff from the ESRC-Centre for Corpus Approaches to Social Science (CASS), Lancaster University and the International Centre for the Study of Radicalisation and Political Violence (ICSR), King’s College London begin 2015 by undertaking a joint research project which aims to map the networks in which UK-based far-right Twitter accounts operate.

The research team is Joseph Carter – Research Fellow at ICSR, Mark McGlashan – Senior Research Associate at CASS, and; Alexander Meleagrou-Hitchens – Head of Research and Information at ICSR. The collaborative research partnership is facilitated by the VOX-Pol Researcher Exchange Programme and CASS, and aims to establish a long-term relationship between the centres and staff.

The partnership brings together complementary research interests that have been explored extensively at both research centres, namely behaviours associated with, amongst others,  extremist political ideologies, nationalism, and (cultural) racism. However, the ways in which these phenomena have been explored at both centres are widely different. The primary focus of the research done at CASS is on the (quantitative) linguistic aspects of, for example, anti-Muslim rhetoric, and explores these primarily through corpus linguistic methods. Whereas, the research at ICSR has centred on examining the political radicalisation with a wider methodological remit, which includes social network analysis, media analysis, and discourse analysis.

The partnership brings together these different research interests on a project which combines aspects of Corpus Linguistics with Social Network Analysis to give both qualitative and quantitative analyses of the online UK far-right as it exists on Twitter. The research will give an overall snapshot of the online behaviour of those who affiliate with the far-right in an online context with findings being channelled towards policy makers, academic and non-academic audiences and into further collaborative research.

Sweepyface: a linguistic profile

This morning brought news of the suicide of a media-branded ‘troll’[1]. Brenda Leyland, the 63 year-old woman behind the @sweepyface Twitter account, a self-proclaimed “researcher” and “anti-McCann” advocate was found dead at a Marriott hotel on Saturday 4th October in Leicester. She was recently contacted by a reporter at Sky News regarding her Twitter activity which frequently suggested that the disappearance of Madeline ‘Maddie’ McCann in May 2007 was being covered up to the profit of her parents Kate and Gerry McCann.

Part of an online community of “antis” – people who challenge the McCann’s account of Maddie’s disappearance – Leyland frequently posted under the @sweepyface Twitter handle tagging her posts with #mccann. “Antis” distinguish themselves from “pros”, or “pro-mccann” advocates, who believe the McCann’s account of their daughter’s disappearance.

Here, we offer a brief and broad analysis of the content flowing to and from the @sweepyface Twitter account during the entirety of 2014, including the language use and online networks in which @sweepyface operated.

Please note that our analysis does not attempt to validate any claims made by any party with regard to the disappearance of Madeline McCann.

Who is sweepyface?

As presented on analysis of the @sweepyface Twitter account

  • Description: researcher
  • Location: London/Los Angeles

What was sweepyface talking about?

  • Number of tweets sent between Jan-Oct 2014: 2,136
  • Tweets by sweepyface which contained the ‘#mccann’ hashtag: 1,992 (93.26% of all tweets in 2014)
  • Frequency
    • We looked at the most frequent words used by @sweepyface in all the tweets sent during 2014. After cutting out frequent grammatical words (like to, the, is, of, and, etc.) which don’t typically reveal much about content, it was found that the most frequent things talked about were:
    • “K & G” – freq 222
      • K & G was used as shorthand to refer to Kate and Gerry McCann, Madeline McCann’s mother and father. They were one of the most frequent topics of interest
      • ‘Kate’ and ‘Gerry’ also appeared, but less frequently (60 times and 39 times, respectively) and were never referred to using their full names, Kate McCann/Gerry McCann
  • “shills”
    • ‘Shills’ was the most frequent lexical word used(unlike grammatical/functional words, lexical words have clear semantic meaning – they are word classes like nouns and verbs).
    • It was almost unique to sweepyface – it was characteristic of her particular way of framing “pros”
    • Shills was used as a catch all term to talk about:
      • Those who would express “pro-mccann” opinions – “pros” and “shills” appear to be interchangeable
      • those who would opposed the opinions of “antis”
    • used as an in-/out-group identifier
  • “police”
    • mostly used to question police practices as in the following Tweet from sweepyface:
      • “#mccann  Rarely a month goes by when our police force are not highlighted as having flawed investigations, PJ is no worse than any other”
    • Tweeted the police, as in the following examples:





19/03/2014 15:32

@metpoliceuk  This is becoming farcical Why will you not consider McCanns as suspects, plenty of clues

08/08/2014 15:31

@gracey52marl @metpoliceuk  #mccann  Not me, I wd like to see Gerrie Nell, prosecute the Mcanns, he wd tear them to shreds

Who did sweepyface affiliate with and what did they say?

Examined only the top 10 accounts with whom @sweepyface had most interaction with. These accounts were:

Rank Account name # of interactions Group
2 TrulyJudy73 456 pro
3 martin_liz 445 anti
4 siamesey 417 anti
5 RothleyPillow 393 anti
6 AdirenM 323 anti
7 1matthewwright1 314 anti
8 ModNrodder 309 pro
9 B_balou 256 anti
10 basilandmanuel 250 pro

Sweepyface most frequently associated directly with others who were actively engaged in talk about the disappearance of Madeline McCann, whether as a “pro” or as an “anti”. Moreover, contact between these accounts was evident and many more accounts were frequently interacting with sweepyface on the same topic.

 [more to follow]

[1] We argue that ‘troll’ as used by the media is defined too broadly – it captures behaviours from low level insults to rape and death threats – and is thus harmful. We adhere instead to the definition of ‘troll’ given here: https://cass.lancs.ac.uk/?p=621

Discourses of Online Misogyny

Indexing reporting and conversations about rape in online social media: India after the 2012 Delhi gang rape


New partners: CASS, Lancaster University and Fields of View, India (Left to right: Onkar Hoysala, Fields of View; Mark McGlashan, CASS; Sruthi Krishnan, Fields of View)

The reporting of incidents of rape of women by (typically groups of) men in India appear to be on the rise. A key incident leading to an increased number of reports occurred in Delhi in December 2012 where a woman and a male friend were kidnapped by a group of six men (including a male legal minor) driving what appeared to be an average passenger bus.

Once on board, the male victim was beaten unconscious and gagged by the attackers who then proceeded to violently rape the female victim which included using what was thought to be a rusted car wheel jack handle. So severe were the injuries this caused to her internal organs that she died only days later in a specialist organ transplant hospital in Singapore.

The attack occurred on December 16th; she passed away on December 29th.

All five adult attackers received a death sentence (one died as a result of injuries sustained from beatings by prison inmates) and the child minor received a three-year custodial sentence.

The attack was reported on internationally and led to a wave of protests throughout India which were instrumental in bringing about legal reform with relation to rape and some social changes with respect to securing ‘safe spaces’ for women, including women-only buses. Since the incident, Delhi has become known as the ‘rape capital’ of India and reported incidents are rising. The latest figures show that official reports of rape and sexual assault have risen across India from 16,373 cases in 2002 to 24,206 in 2011 (an increase of 67.6%).

Although an increased journalistic focus on the issue has led to a greater scrutiny of legal practices and public awareness of rape in India, there remain some significant issues. For instance, marital rape is not a criminal offence and the treatment of rape by India’s politicians is of particular concern, with several top officials making troubling and controversial public statements damaging any belief that the Indian government is taking what appear to be endemic cases of rape and sexual assault seriously. As recently as this month, rape has been described as ‘sometimes right, sometimes wrong’ by cabinet minister of the ruling Bharatiya Janata Party (BJP) Babulal Gaur. Another BJP minister, Ramsevak Paikra, has suggested that incidents such as rapes and sexual assaults, rather than being deliberate, “happen accidentally”.

In the wake of these incidents, this research aims to examine a number of important and practical issues in relation to rape in India, including:

  • Legal understandings of acts of rape and sexual assault
  • Social attitudes and understandings in relation to rape and sexual assault
  • Media discourses in relation to rape and sexual assaultS

CASS partnering with Fields of View, India

In order to investigate the social impact of increased reporting about rape in India since the Delhi gang rape incident, as well as the social influence of controversial statements from institutionally influential individuals, CASS are teaming up Fields of View based in India. Fields of view are a research outfit aimed at studying complex social phenomena (e.g. city planning, infrastructure development) and developing intuitive ways to collaborate and engage with the public.

The project will take twitter data and chart the changes over time regarding conversations and reporting of rape in India. The collaborative effort will implement analytical methods developed at CASS to analyse the data which will be interpreted into visual and interactive digital outputs by Fields of View.



Twitter’s reaction to the Benefits Britain live debate

Benefits Street was a series of television programmes broadcast by the Channel 4 outlet between 6th January and 10th February 2014 which, as Channel 4 have claimed, “sparked a national conversation about Britain’s welfare system”. The programme focussed on a community of people living in the economically deprived area of Winson Green, Birmingham and specifically documented the families and individuals that inhabit James Turner Street.

Following the series of pre-recorded, documentary-style programmes (the last episode of which was aired on 16th February 2014), Channel 4 hosted a live debate entitled Benefits Britain which featured a range of public figures and those who were documented in Benefits Street. This report looks at a set of data collected on the date on which the Benefits Britain debate aired (17th February 2014).


The data selected to analyse reaction to this series were Tweets, or short ‘micro-blogs’ that offer users the opportunity to voice their opinions and network with other viewers (e.g. using @ replies or # topics) in real-time. Tweets were collected from 00:00am on Sunday 16th February 2014 (the date of the final airing of Benefits Street) until 23:59pm on Saturday 22nd February 2014 (totalling one calendar week worth of Twitter data).

To do this, we used the Twitter API to collect any tweets which contained in their content any of the following terms (note: the terms are not case sensitive, so terms can contain upper or lower case words without affecting data collection):

  • Benefits Britain
  • #BenefitsBritain
  • James Turner
  • Benefits Street
  • #BenefitsStreet

This query returned 81,100 tweets which came in at a total of 1,501,938 words (tokens).



The #benefitsbritain hashtag was the most frequent token in the corpus featuring in 45,400 (3.02%) of all tweets. Channel 4 adopted the #BenefitsBritain hashtag immediately following the end of the Benefits Street programme which used the #BenefitsStreet hashtag, although this hashtag was used less (0.86%) of the time during the time in which the corpus was collected.

Several concerns are frequently expressed by users of the #BenefitsStreet hashtag. It was found that the word people is the most frequent ‘content word’ in tweets containing the #BenefitsBritain hashtag occurring in 15.2% of those tweets and occurs most frequently in the word cluster people on benefits. This cluster is associated with a number verbs including are, should, and have, which appears to be involved in ways of evaluating who people on benefits are as well as their (perceived) behaviours.

Who people on benefits are

Some appear to be challenging the stereotype that benefits claimants are workshy or lazy:

  • #benefitsbritain Some people on benefits are good people who’ve gone through a bad time not everyone on benefits are scumbags.
  • don’t think people should comment on things until they have been in that situation. Not all people on benefits are lazy etc!#BenefitsBritain
  • #BenefitsBritain am so annoyed that that show has stigmatised all people on benefits are scum when we all aren’t IT’S SO ANNOYING!!!!!

Some argue the absolute opposite:

  • #BenefitsBritain kiss my ass i think most people on benefits are lazy and need to get a damn job!!!! Cut all benefits for able bodies people

Or assume that claiming benefits is a result of a lack of skills or underlying criminality:

  • Half the people on benefits are unemployable stop there benefit and they commit crime and it costs more to imprison them #BenefitsBritain

And some are somewhat more ambivalent:

  • #BenefitsBritain Not all people on benefits are lazy, but if it becomes a lifestyle its dangerous territory, idle minds are the devils work.

What people on benefits do

In terms of evaluating what people on benefits do, a number users question the (perceived) behaviours of those claiming benefits:

  • Fail to see why some people on benefits are allowed to spend their money on drink, cigarettes and drugs #BenefitsBritain
  • watching the debate #BenefitsBritain most people on benefits have a criminal record now who wants to give them people a chance no one

Others propose possible restrictions on (perceived) social and spending behaviours:

  • Why don’t people on benefits have vouchers instead of money? Then they wouldn’t spend it on drink and drugs #BenefitsBritain
  • I stand by the fact that people on benefits should not have children when they cant afford to feed themselves. #BenefitsBritain
  • Agree with the guy who said people on benefits should be given food stamps #BenefitsBritain

Or suggest certain behavioural conditions be fulfilled in order to claim benefits:

  • People on benefits should be made to go out&do something before they get money volunteering or something!! #BenefitsBritain #BenefitsStreet
  • Active people on benefits should earn their benefits through voluntary work to assist the community #BenefitsBritain
  • People on benefits should only get paid if they do voluntary/training work. Then there is some progress in their lives. #BenefitsBritain

Some argue that people are workshy:

  • People on benefits have lacked the ability to work hard in education there for getting a low paid job or none at all #BenefitsBritain
  • #BenefitsBritain all people on benefits should get of their arse and work like the rest of us do everyday

Or have a grudge against those who work:

  • What is it that some people on benefits have against working class people who’ve been successful? #benefitsdebate #BenefitsBritain


Two specific names were also frequent in tweets using the #BenefitsBritain hashtag.

The first is the host of the Benefits Britain debate, Richard Bacon. Mainly, those who spoke about Bacon brought his abilities as a host into question. One of the more creative and less direct insults being:

  • Richard Bacon is a cross between Jeremy Kyle & Kilroy! @Channel4 would have been better off getting @rickedwards1 hosting #BenefitsBritain

The second person featuring frequent was (White) Dee, a prominent personality in the Benefits Street programme. Mainly, the response to her was positive. Although, there were some negative reactions:

  • #BenefitsBritain always the governments fault -what nonsense Dee will never look at herself and see what a lazy scroungers she is
  • My view on #BenefitsBritain Richard Bacon is a cock oh and White Dee is a sweaty lazy cow


Aside from the #BenefitsBritain hashtag, the next most frequent token in the corpus was the determiner ‘the’. The fourth most frequent token was the word ‘to’, which can be interpreted either as a preposition or as part of infinitive verbs. Looking at clusters in which to occurs revealed that in fact to occurred within a number of infinitive verb forms. I look here at the 3 most frequent: to be, to work, and to get, to see how infinitives work within the #BenefitsBritain tweet corpus and what ideas they are used to express.

To be

The infinitive verb to be was frequently found being used in a number of interesting ways.

Users were excited that the Benefits Debate was going to be interesting:

  • #BenefitsBritain this is going to be interesting!

And frequently challenged the stereotype that only poor people are drug addicts, as with this retweet:

  • ‘Billionaire’s Row residents are as likely to be drug addicts as people on Benefits Street’ says MP Chris Bryant http://t.co/JG750GrJE5

When found in the cluster need to be people and work again became central to debate:

  • Finally people talking about politics. Reality is we need to be paying people a living wage vote labour #benefitsstreet
  • Benefits is like a Government Drug. These people need to be weaned off the drug and get a job! #BenefitsBritain #BenefitStreet

To work

The infinitive to work not only most frequently occurs in the word cluster want to work, but is also closely associated with different ways of referring to people, either through pronouns (they, everyone, I), or the most frequent ‘content word’ in the corpus, people. As such, the formation want to work is found in tweets expressing general opinions about the desirability of work:

  • Some people do want to work but it’s not as simple sick people are getting harassed to work when they are not fit #BenefitsBritain
  • Majority of disabled and unemployed people want to work #BenefitsBritain #BenefitStreet
  • I am so sick of hearing, make work pay, incentivize people to work. People want to work. The jobs don’t pay a living wage #benefitsbritain

Moreover, want to work is strategically used in straw man arguments against the idea that people want to work:

  • These people clearly want to work? Really??? has he watched the same programme? #BenefitStreet #BenefitsBritain

And frequently collocates with the negative forms such as don’t, doesn’t in examples such as the following which express the idea that those people claiming benefits see work as undesirable:

  • Let’s be real most of the people on the programme don’t really want to work anyway #BenefitsBritainIf we’re being fair…there are also A LOT of people on benefits who definitely DON’T want to work… #BenefitsDebate #BenefitsBritain
  • #BenefitsStreet there is an inherent problem with some ppl in this country; they don’t want to work! Send them overseas; no benefits
  • #BenefitsBritain Not all people on benefits want to work just come #skelmersdale for the next series. Wont need no editing or bribes!!

To get

To get is the third most frequent infinitive verb formation and occurs most frequently in the phrase to get a job. Underpinning how this phrase is used is a moralised debate surrounding (un)employment which naturalises and elevates the status of employment and the employed and alienates and derides unemployment and the unemployed; having a job makes you good, having no job makes you bad. This is borne out by the data.

This includes talking about the difficulty of getting a job:

  • “#BenefitsBritain makes a lot of valid points, you need experience to get a job, you need experience to get experience! Can never win!”

Structural/political issues:

  • #benefitsstreet #BenefitsBritain is all the fault of #thatcher who closed everything down then #cameron who makes it difficult to get a job

And corruption:

  • #BenefitsBritain to get a job it’s not all what you know its who you know #thesystemsfucked

As well as reactions against pressure to work within a climate where work is hard to find:

  • These guys on Benefits Britain thinking it’s so easy to get a job. Get back to reality you stuck up twats! #BenefitsBritain #BenefitsStreet

However, most of the uses of the to get a job phrase target jobseekers and construct them in relation to prejudices and assumptions about the (un)employed:

  • Why is everyone too scared to stand up and say ‘work harder to get a job/off drugs/off drink’? #BenefitsBritain
  • #benefitsstreet this show makes me so angry.. Get off your fat ass and try to get a job instead of sponging off the country
  • Fuck this Benefits Street debate is making me angry. Lazy twats need to get a fucking job.
  • #BenefitsBritain kiss my ass i think most people on benefits are lazy and need to get a damn job!!!! Cut all benefits for able bodies people


This data highlights a kind of moralisation of (un)employment, where ideologies underpinning this moralisation are both reinforced and challenged. The data reveals a number of apparently stable linguistic formations used to talk about unemployed benefits claimants, which appear to have revealed aspects of the ideological underpinnings of the debate.

Two plead guilty over Twitter rape threats


Tuesday 7th January saw John Nimmo and Isabella Sorley plead guilty to sending messages “menacing” in nature to Feminist campaigner Caroline Criado-Perez and Walthamstow MP Stella Creasy via multiple Twitter accounts.

In July 2013, Criado-Perez had been successful in campaigning for author Jane Austen to appear on the £10 bank note. Shortly after in final days of July and spilling into August, a torrent of abuse was directed at Criado-Perez including numerous threats to sexually abuse, rape, torture, and kill the campaigner. After lending Criado-Perez support on the social networking site, Creasy was also targeted by abusive users.

The prosecution identified abusive traffic from 86 different Twitter accounts, several of which belonged to the defendants.

The court heard from prosecutor Alison Morgan that Criado-Perez felt “significant fear” due to the menacing nature of the tweets which have had “life changing psychological effects”, Creasy reported that both her personal and professional life were impacted upon by the messages.

Sorley held her face in her hands as the prosecutor read aloud some of her offending tweets, which included;

“You’re wasting shits loads of time because you can’t handle rape threats, pathetic! Rape is the last of your worries!!!!”

“rape?! I’d do a lot worse things than rape you!!”

“I will find you and you don’t want to know what I will do when I do, you’re pathetic, kill yourself beforeI i do #godie”

When arrested in October of 2013, Sorley admitted to sending the abusive tweets, saying that she was “bored” and that “I was off my face on drink” at the time, although she accepted that some tweets could be perceived as death threats.

Nimmo, on the other hand was arrested in July of 2013 after having been tracked down by a Newsnight reporter and gave no comment when arrested. His defence claimed that he is a “social recluse” whose “social interaction, social life, is online” as a result of being “systematically bullied at secondary school, both physical and verbal”. As a result of social exclusion, his defence claims, Nimmo has “no social life, no friends, he strives for popularity” and that his “outrageous comments [were] made for retweets”.

Both Sorley and Nimmo plead guilty under Section 127 of the Communications Act (2003) and are to appear before the Westminster Magistrates court later this month.

CASS investigates

I travelled to the court to witness the trial as part of work being undertaken as part of a research project on Discourse of Online Misogyny (DOOM) here at CASS. Our initial aim is to investigate the ways in which language was used as part of the threats made against Caroline Criado-Perez and Stella Creasy on Twitter. Building on this, we will produce sophisticated analytical tools to provide critical analyses of language and other kinds of behaviours which emerge during instances of online abuse (such as network building).

CASS outputs


Claire Hardaker, Lecturer in English Language and Principal Investigator of the DOOM project, appeared on the 07/01/2014 edition of Newsnight.

CASS: Briefing


You can also read a summary of this work in a complimentary CASS: Briefing.

View and download it here: Researching online abuse: the case of trolling     

Challenging Homophobia and Homophobic Bullying through Children’s Literature: a Parliamentary event

On July 16th 2013 I hosted an event supported by ESRC/CASS and the Lancaster University FASS-Enterprise Centre on Challenging Homophobia and Homophobic Bullying through Children’s Literature.

The event aimed to start a conversation about the use of children’s literature as a resource for effectively challenging homophobia and homophobic bullying and included attendees ranging from MPs and charity spokespersons to prominent academics and educational practitioners to children’s publishers and literature retailers. All who attended were experienced in issues of homophobia and homophobic bullying or with issues relating to inclusive children’s literature.


The 2-hour event, which took around 6 months of organisation to bring together, included 6 presentations and a roundtable discussion, and turned out to be a success both in terms of an opportunity for knowledge exchange and networking.


The presentations were structured into 2 sessions. The first session focussed on issues of homophobia and homophobic bullying. The second session focussed on issues of using children’s literature as a means for addressing issues of inclusion.

Continue reading

Challenging Homophobia & Homophobic Bullying through Children’s Literature

Homophobic bullying, whether verbal, physical, or cyber, is a significant and prevalent issue in schools[ref]Rivers, Ian. (2011) Homophobic Bullying. Oxford: Oxford University Press.[/ref]. Stonewall, a leading charity in campaigning for lesbian, gay and bisexual (LGB) rights, reported in 2012 that 55% of LGB children in British schools experience bullying[ref]http://www.stonewall.org.uk/documents/school_report_2012(2).pdf[/ref]. They also reported earlier in 2007 the results of a YouGov survey of over 2,000 primary and secondary teachers who suggest that it isn’t just LGB young people that experience homophobic bullying and harassment but that it is a wider issue that is experienced by young people regardless of their sexual orientation.[ref]http://www.stonewall.org.uk/other/startdownload.asp?openType=forced&documentID=1695[/ref]

Bullying is sadly still a ubiquitous element of many students’ school experiences[ref]Poteat, V. Paul., Mereish, Ethan. H., GiGiovanni, Craig. D. & Scheer, Jillian. R. (2013) ‘Homophobic Bullying’. In, Rivers, Ian & Duncan, Neil (Eds.) Bullying: experiences and discourses of sexuality and gender. Oxon: Routledge. Pp. 75-90.[/ref] which has immediate and long-term detrimental effects for the victims of bullying[ref]Cowie, Helen. (2013) ‘Immediate and long-term effects of bullying’. In, Rivers, Ian & Duncan, Neil (Eds.) Bullying: experiences and discourses of sexuality and gender. Oxon: Routledge. Pp. 10-8.[/ref]

Those who are bullied are affected in terms of their physical, psychological and social health and well-being: loneliness, depression, anxiety, low self-esteem; psychosomatic symptoms such as headaches, abdominal pain, and sleeplessness; poor school grades; premature alcohol and tobacco consumption. These are all generally associated with children being bullied.

So important is the specific issue of homophobic bullying that the independent children’s services inspectorate, Ofsted, which is responsible for the regulation of quality in maintained schools and academies in the UK, recently implemented ‘exploring the schools actions to prevent homophobic bullying’ to its list of briefings used during school inspections; a very positive move in the prevention of homophobic bullying.

Children’s literature is a key educational resource

Children’s literature is already almost inextricably linked to education. Literature is already used in schools to encourage and teach literacy as well things like sex and relationships education (SRE), citizenship; and Personal, Social, Health, and Economics (PSHE) education. In recent years, children’s literature has also been recognised as a credible and useful resource for preventing homophobic bullying and creating inclusive culture[ref]No Outsiders Project Team (2010) Undoing Homophobia in Primary Schools. Staffordshire: Trentham Books Ltd.[/ref], although LGBT-inclusive books are yet to become a staple of school libraries. So, why not integrate or produce LGBT-inclusive resources that help schools prevent homophobic bullying?

Doing something about it

There is a growing recognition of the need, want, and support for resources aimed at young people to promote inclusive, anti-homophobic practices but there is still little being done to address the lack of resources. So, with the help of the FASS-Enterprise Centre and the ESRC Centre for Corpus Approaches to Social Sciences (CASS) at Lancaster, on July 16th a day event will take place in the Palace of Westminster. At this event, I host a number of key spokespersons from the diverse areas of politics, publishing, retail, charities, and academia where recent work relating to homophobia, homophobic bullying and children’s literature will be shared and discussed in order to better challenge homophobia and homophobic bullying through children’s literature.

Work to be presented on the day

Stonewall’s head of Education, Wes Streeting, and Professor of Human Development at Brunel University, Ian Rivers will present recent work on homophobic bullying in schools. Paul Baker, Professor of English Language and Linguistics at Lancaster University, will present the results of work done using corpus linguistic methods on changing representations of homosexuality in the British press. Mark McGlashan, PhD student at Lancaster University, presents work on current representations of same-sex parent families in picturebooks. Beth Cox presents on her work as part of Inclusive Mind, a collaborative network of consultants and campaigners which aims to increase socially inclusive representations in children’s literature. Finally, teacher trainer, consultant and writer Mark Jennett presents work on using children’s literature as resources for inclusion in schools.

Funded by