muslimnews0I was honoured to attend the 25th Anniversary Conference for the Muslim News on the 15th September. The event was organized by the Society of Editors and the Daily Telegraph had provided the venue – the spectacular Merchant Taylor’s Hall in the City of London. The event began with a speech by the Bob Satchwell, Executive Director of the Society of Editors and a welcoming speech by Lord Black of the Telegraph Media Group. Following that, Fatima Manji of Channel 4 News introduced me and I gave the morning’s keynote speech discussing the work which I did with Paul Baker and Costas Gabrielatos (Discourse Analysis and Media Attitudes, The Representation of Islam in the British Press) looking at the representation of Islam and Muslims in the UK press. I was also very happy to be able to present some early findings from a follow up study Paul Baker and I are currently doing, supported by CASS and the Muslim NGO MEND, looking at how things have developed since our work was published. This is based on approximately 80 million words more data composed of all UK national newspapers articles mentioning Muslims and Islam in the period 2010-2015.

The audience included a mixture of journalists, newspaper editors and TV news reporters and editors. In addition there were representatives from many faith groups and NGOs present too. The research was very well received by the audience. After the talk a panel was convened to discuss the work and take questions from the audience. The panel included John Wellington, the managing editor of the Mail on Sunday, Doug Wills, managing editor of the London Evening Standard and the Independent group of newspapers and Sue Ryan, former managing editor of the Daily Telegraph and manager of the trainee programme for the Mail group. It was a real privilege to be able to discuss our work with them and I found them to be open to criticism and ready to consider change. One point that emerged from the discussion that was of interest, I thought, was that the press are often criticized for their use of language when that usage is current in general English. While this puts the press in the spotlight, it also means that at times they can be in the vanguard of discussion and change in language use, as the recent discussion of the use of the word ‘migrant’ in the UK media has shown. This makes an engagement with media language all the more important for academic researchers.

Following this panel was a second panel, chaired by Fatima Manji, composed of the editors of ITN news and BBC news (Robin Elias and James Stephenson) as well as Channel 4’s Home Affairs correspondent Simon Israel. Julian Petley, author of Pointing the finger: Islam and Muslims in the British media, gave academic weight to this panel’s discussion. A very thought provoking discussion ensued about how to achieve a more inclusive and representative newsroom which demonstrated, once again, that the media was willing to engage in discussion and was prepared to embrace change.

After lunch the final session, chaired by Ehsan Masood of Research Fortnight, followed a
contribution from Jonathan Heywood of Impress on a Leveson compliant media watchdog that Impress are developing. A lively debate followed led by the head of IPSO, Sir Alan Moses. Sir Alan was joined by prominent editors from The Sunday Times (Eleanor Mills) and The Observer (Stephen Pritchard) as well as the Managing Editor of the London Evening Standard and Independent Group, Will Gore. A key tension that was highlighted by Sir Alan Moses in the debate was between what in principle may be desirable and what is achievable in reality. He also made the important point that we have to decide as a society where we want regulation to end and a softer form of social regulation to begin. I finished the afternoon with a brief and rewarding discussion of my work with Sir Alan.

The event was a rare and precious opportunity to showcase academic research to a range of key stakeholders and for that opportunity I am very grateful both to MEND and to Muslim News.

CL2015 – Presenting for the First Time at an International Conference

In July 2015 I was lucky enough to give a presentation at the Corpus Linguistics 2015 conference at Lancaster University. This was my first time presenting at an international conference, and I was nervous but very excited. I thought I would use this blog post to elaborate on my experience of presenting at a conference for the first time, and hopefully give some advice to people who may be worrying about giving their first conference presentation (or to see how my experience compares to those of you who are already well practiced at this)!

All the way back in January 2015 I put together my abstract to submit to the conference. This was quite a tricky process as the abstracts for CL2015 were required to be 750-1500 words in length. This meant that more than a simple summary was needed, but that I also couldn’t go into a great amount of detail about my method or results. After many re-drafts I managed to find a balance between the two, and with crossed fingers and toes I submitted my abstract. Crossing my fingers must have worked (or maybe it was all the re-drafting…) because I was delighted to find out that I had been accepted to present at the conference! The feedback from the reviewers was mostly positive, but, even when reviewers suggest lots of changes, it’s important to see this as a way to make your work even better rather than as negative feedback.

After the elation of being accepted had worn off, I had a sudden realisation of “Oh my God, I actually have to stand up and talk about corpus linguistics in front of a whole room of actual professional corpus linguists!” However, after lots of practice in front of my PhD supervisors and fellow students who would be presenting at the conference I began to feel more confident. That was until the first day of the conference arrived and I found out that I would be presenting in one of the biggest lecture theatres in the university!

After a few moments of worry about whether anyone would be able to hear me, or whether anyone would even come, I thought “Well, there’s no point being nervous, you’ve practiced as much as you can, let’s just enjoy it!” And, as is usually the case when you’ve spent a long time worrying about everything that could go wrong, everything went absolutely fine. I had a good sized audience, my presentation worked, and I managed to answer all of the questions put to me. Something I found very helpful whilst presenting was to have a set of cue cards with very short bullet point notes on for each slide – I barely looked at them, but it was reassuring to know that they were there in case I completely froze up! The only thing that didn’t go quite to plan was my timing; I was a couple of minutes short of the allotted 20 minutes for presenting. However, over the course of the conference I learnt that this is vastly preferable to being over the time limit. Giving a presentation which is too long makes you seem unrehearsed and leaves you with no time for questions or comments. It can also ruin the timings for all of the other presenters following you, so make sure you rehearse with a stopwatch beforehand!

I received some lovely feedback after the presentation both in person and on Twitter. This allowed me to meet lots of other people at the conference with similar research interests to mine, and gave me lots of ideas for future research.

Overall, presenting at CL2015 was a very enjoyable and extremely valuable experience. It taught me that, with the right amount of preparation, giving a presentation to experts in your field is not something to worry about, but rather an opportunity to showcase your work and help it progress. My top tips for those of you worrying about presenting at a conference would be:

1) Don’t rush your abstract, you won’t get the chance to worry about presenting if your abstract doesn’t showcase why your work is important and interesting.

2) Practice with friends, colleagues, anyone who will listen! And time yourself with a stopwatch – you don’t want to be the one that the chair has to use the scary ‘STOP TALKING NOW’ sign on!

3) Use cue cards if it makes you feel more confident. However, DON’T write a script – this will make you seem over-rehearsed and you won’t be as interesting to listen to.

4) Put your Twitter handle on your presentation slides so that you can network and people can give you feedback online as well as in person.

5) See presenting as a valuable chance to have your work evaluated by experts in your field, and enjoy it!

Do my experiences of presenting at a conference for the first time match yours? Have you found these tips helpful? Let us know @corpussocialsci!

#CL2015 social media roundup: Using Corpus Linguistics to investigate Corpus Linguists talking about Corpus Linguistics


Corpus Linguistics 2015 – CL2015 – is the largest conference of its kind and this year drew over 250 attendees from all over the world to present work outlining the state of Corpus Linguistics (CL) at large, leading-edge technology and methods, and setting the agenda for years to come.

Of particular interest to me was a small but important streak of enquiry running through the conference, which is also becoming more prevalent in CL as a whole. That is, a focus on corpora collected from online source such as blogs and social media (Elgesem & Salway 2015; Grieve, et al. 2015; Hardaker & McGlashan 2015; Knight 2015; Longhi & Wigham 2015; McGlashan & Hardaker 2015; Statache, et al. 2015). The Internet now enables great opportunities for the collection and interrogation of large amounts of data – big data, even – and the rapid compilation of specialised corpora in ways previously impossible.

I focus here on social media data, specifically data collected from Twitter. Sampling data from Twitter, like a lot of other online sources, offers the opportunity to collect what people are saying (the content of their posts; tweets) but also a huge amount of metadata about the date, time, user, shared content (e.g. hyperlinks, retweets), interactional information, etc. relating to those posts. As Corpus Linguists, we therefore get the data we sample for – posts containing the thing(s) we are interested in – as well as other social information about the content creators and their social networks that we may or may not be interested in. Indeed, concerns about the kinds of metadata included and attached to online post is an issue that has sparked a great deal of debate about the ethics of collecting and using publicly posted online content, though these concerns are not discussed here. Instead, the potential for online ethnography is explored. In order to do this, I pair familiar CL research methods with methods from Social Network Analysis (SNA) that are more explicitly focussed on social networks and examining the myriad ways people affiliate with each other.

Theory & Methods: Corpus-assisted Community Analysis (CoCoA)

Corpus-assisted Community Analysis (CoCoA) is a multimethodological approach to the study of online discourse communities that combines methods from Discourse Analysis (DA), CL, and SNA.

Corpus-assisted Discourse Analysis

I predominantly draw on Baker (2006) in my approach to corpus-assisted DA, seeing discourse in a Foucauldian sense as, forms of social practice; “practices which systematically form the objects of which they speak” (Foucault 1972: 49). Particularly, I am interested in the incremental effect of discourse. Baker suggests, “a single word, phrase or grammatical construction on its own may suggest the existence of a discourse” (2006: 13). However, in order to investigate how quantitatively typical or pervasive discourse is within a discourse community, numerous examples of linguistic instantiations of discourse are required to make a claim about its cumulative effect (ibid.). Following Baker, I argue here that corpora and CL techniques enable this kind of quantitative examination of discourse.

Social Network Analysis

SNA implements notions from graph theory for the formal modelling and describing the properties of relationships between objects of study such as people and institutions. A graph (or ‘sociogram’) is a representation of people or institutions of interest as ‘nodes’ and the relationships between them as a set of lines known as ‘edges’; a graph is built by representing “a set of lines [‘edges’] connecting points [‘nodes’]” (Scott 2013: 17). To interpret graphs, graph theory contributes “a body of mathematical axioms and formulae that describe the properties of the patterns formed by the lines [‘edges’]” (Scott 2013: 17). One of these axioms is ‘directionality’. Directed graphs can encode both symmetric and asymmetric relations (D’Andrea, et al. 2010: 12). Directed relationships are where nodes are connected by an edge that has a direction of flow from one node to another is known as asymmetric, as illustrated by the relations between A and C, and C and B in Fig. 1. Symmetric relationships are those in which an edge connects two nodes but is bidirectional – the direction of relation flows both ways – as illustrated by the relationship between A and B in Fig. 1. Directed relationships on Twitter include followership relations and the act of mentioning – i.e. including the handle (e.g. @CorpusSocialSci) – in tweets.


Figure 1: A simple directed graph

Undirected graphs represent identical, symmetric relationships between nodes which might be the result of nodes sharing reciprocal attitudes or “because they have a common involvement in the same activity” (Scott 2013: 17). Fig. 2 contains gives a graphical representation of an undirected graph.


Figure 2: A simple undirected graph

Directed and undirected (‘ambient’) kinds of affiliation are both understood here as being distinct forms of discursively constructed social practices. Furthermore, I adopt the term ‘ambient affiliation’ from the work of Zappavigna on the use of social media in the formation of community and identity (Zappagigna 2012; Zappagigna 2013). Ambient affiliation is about the functionalities of social media platforms that enable users “to commune with others without necessarily engaging in direct conversational exchanges” (Zappagigna 2013: 223-4). Therefore, ambient affiliation is about people exhibiting the same behaviours or sharing the same qualities but without directly interacting with each other. This notion closely approximates to the notion of an ‘undirected’ graph. In developing the theory of ambient affiliation Zappavigna draws on Page’s work on hashtags. Page refers to hashtags as “a search term” (2012: 183). Hashtags – a string of characters (usually a word or short phrase) unbroken by spaces or non-alphabetic/non-numeric characters (excl. underscores ‘_’) preceded by ‘#’ (e.g. #YOLO) – are used a metadiscursive markers of the topic of a tweet. Page goes onto argue that, “the kind of talk which aggregate around hashtags […] involve multiple participants talking simultaneously about the same topic, rather than individuals necessarily talking with each other in dyadic exchanges that resemble a conversation” (2012: 196). As such, Page suggests that hashtags destabilise conventional adjacency pairs characteristic of many forms of human dialogue and give a new way for humans to interact on a topic of mutual interest.


I collected all tweets and retweets including the official hashtag of the Corpus Linguistics 2015 conference – #CL2015 – posted from the date of the first pre-conference workshop (20/07/2015) through until the final day of the conference (24/07/2015). To do this, I used the R based Twitter client ‘twitteR’ to access the Twitter API. The resulting data amounted to:

  Total number
Tweets 671
Retweets 1025


  Tweets Retweets
20/07/2015 57 76
21/07/2015 128 169
22/07/2015 152 370
23/07/2015 176 241
24/07/2015 158 169
Totals 671 1025

The tweets corpus contained around ~10,000 words in total.


The data contained some ‘noise’ mainly caused by other people using the same #CL2015 hashtag to talk about another event occurring during the period of the conference. However, as I will show in the analysis, the methods enable researchers to focus only on the communities they are interested in.


Tweets – what was being talked about?

To find out what people were talking about day-to-day, I created daily tweet corpora. With each of these daily corpora, I performed a keyword analysis using a reference corpus compiled using the remaining other days. So, for the tweets sent during the pre-conference workshop day (20/07/2015) I used the tweets sent during the rest of the conference (21/07/2015-24/07/2015) as a reference corpus, and so on. The resulting top 10 keywords for each day are given in the table below.

  20/07/2015 21/07/2015 22/07/2015 23/07/2015 24/07/2015
1 CL2015 change sealey partington illness
2 workshop fireant granger duguid mental
3 pre biber animals gala news
4 workshops climate sylviane class literature
5 conference doom campaign dinner yahoo
6 main misogyny heforshe please dickens
7 starting academic collocation poster csr
8 historian assist eeg legal health
9 day biber’s handford alan incelli
10 lancaster bnc learner mock jaworska

The keywords shown in each column outline the most distinctive topics tweeted about during the conference. Italics used here relate back to keywords in the table.

On day 1, the pre-conference workshops, including @antlab‘s pre-conference corpus tools brainstorming session and @stgries’s pre-conference #R workshop were popular topics of conversation in the smallest subsample of tweets for the week.

Top favourited tweet from day 1:

On day 2, more diverse topics start to emerge. Change became a theme, relating to Andrew Salway’s talk on discourse surrounding climate change but also relates to a talk given by Doug Biber on historical linguistic change in ‘uptight’ academic texts. Fireant, a new user-friendly tool for efficiently dealing with large databases developed by Laurence Anthony, was also unveiled to the CL masses on day 2, which prompted a flurry of excited tweets [keep track of Laruence’s Twitter page for release]. DOOM and misogyny also became topical following talks by Claire Hardaker and Mark McGlashan on the Discourse of Online Misogyny project. Finally, some excitement followed a paper given by Robbie Love and Claire Dembry about the new Spoken BNC2014. For those interested, keep track of the CASS website for spoken data grants later in the year.

Top favourited tweet from day 2:

 Day 3 saw another topic change focussing most prominently on Alison Sealey’s talk on the discursive construction of animals in the media, Sylviane Granger’s plenary on learner corpora, a talk on the public’s online reactin to the #HeForShe campaign given by Rosie Knight, and Jen Hughes’ talk on the application of EEG (‘Electroencephalography’) to the study of collocation as a cognitive phenomenon.

Top favourited tweet from day 3:

After 3 days of incredibly interesting talks, corpus linguists were about ready for their gala dinner on day 4. But before all the cheesecake, the CL2015 were excitedly tweeting about the all important poster session, Alison Duguid’s talk on class, the Geoffrey Leech tribute panel which included Charlotte Taylor’s paper on mock politeness and ‘bitchiness’ as well as Lynne Murphy and Rachele de Felice’s talk on the differential use of please in BrE and AmE, Alan Partington’s plenary speech on CADS; and papers given by Ruth Breeze, Amanda Potts, and Alex Trklja, on the application of CL methods to the study of a broad range of legal language.

Top favourited tweet from day 4:

Day 5 brought #CL2015 to a close but the number of tweets remained steady with health on the agenda with talks from Ersilia Incelli and Gillian Smith who both focussed partly on the construction of mental illness/health in the news. News also featured Monika Bednarek’s talk on news discourse and Antonio Fruttaldo’s analysis of news tickers. Other key topics related to Sylvia Jaworska and Anupam Nanda’s paper on the Corpus Linguistic analysis of Corporate Social Responsibility (CSR), Michaela Mahlberg’s work on the literature of Charles Dickens, and discussion of a corpus of Yahoo answers in the week’s penultimate panel on triangulating methodological approaches.

Top favourited tweet from day 5:

Approaching tweets in this way, it was possible to find out the most salient topics of each day. However, I was also interested in the retweeting behaviour of attendees.

Retweets – what was being talked about?

I looked at the top 10 most frequently retweeted tweets during the conference. Due to the intertextual nature of retweets – they are simply identical reposts of the same content – methods familiar to CL such as word frequency lists may not be as useful in their study. For example, if a few retweets are particularly frequently reposted, the most frequent words will be skewed by the content of the most frequent retweets. Instead, I suggest that retweets themselves should be conceptualised as being individual types in and of themselves that require more qualitative approaches to their interpretations (at least in this context). The top 10 most frequently retweeted tweets including the #CL2015 hashtag are given below:

  Retweet Date Freq
1 RT @EstrategiasEc: Concluimos este viernes con exitoso proceso de postulación @ECLideres VI Prom. #CL2015 con auspicio de @ucatolicagye. ht… 22/07/2015 218
2 RT @perayson: To access the new HT semantic tagger from the @SAMUELSProject see http://t.co/5LFWH8YGAH and http://t.co/BPxcC8pNNK #CL2015 23/07/2015 15
3 RT @UCREL_Lancaster: The #CL2015 abstract book is now available to download from the conference website http://t.co/px9hh3mMNe 21/07/2015 13
4 RT @duygucandarli: Important take-away messages about corpus research in Biber’s plenary talk at #CL2015! http://t.co/xm87Uo1umZ 21/07/2015 11
5 RT @lynneguist: Alan Partington looking at how quickly language changes in White House Press Briefings… #CL2015 http://t.co/jeVjvC8Ym3 23/07/2015 10
6 RT @CorpusSocialSci: .@_paulbaker_ reflecting on a number of approaches to the same data at the Triangulation panel at #CL2015 http://t.co/… 24/07/2015 10
7 RT @CorpusSocialSci: .@vaclavbrezina introduces Graphcoll, a new visualisation tool for collocational networks #CL2015 http://t.co/PM5FxS5N… 22/07/2015 9
8 RT @_ctaylor_: It’s a myth that reference corpora have to larger than target corpus says @antlabjp  #cl2015 22/07/2015 7
9 RT @Loopy63: #CL2015 Call for papers for Intl. Conference on  statistical analysis of textual data 2016 in Nice, France: http://t.co/3JpcAa… 23/07/2015 7
10 RT @vaclavbrezina: A great use of #GraphColl by @violawiegand – #CL2015 poster presentation @TonyMcEnery @StephenWattam http://t.co/uwlMGUY… 24/07/2015 7

The most frequent retweet was regarding a Latin American Youth Leadership programme that shared the same #CL2015 hashtag [nb. For next year, Corpus Linguistics conference organisers…]. As you will notice, this retweet occurred on 22/07/2015 but as retweets and tweets are dealt with exclusively, the retweet does not interfere with the keyword analysis done for the same day on the tweets.

What do the most frequent retweets highlight? Free tools (GraphColl, HT semantic tagger), free resources (abstract book), plenary talks and more conferences.


With a general idea of what people are talking about and sharing using the #CL2015 hashtag, I was interested to examine the overall activity around #CL2015 and the emergence of discourse communities.

In terms of tweets the gif below shows how relationships developed over the course of the conference. Every node represents a Twitter account that posted a tweet containing #CL2015 during the period of data collection. The size of these nodes is dictated by their ‘degree’, or its number of edges. More edges = larger node. The colour of the nodes is determined by ‘betweenness centrality’, which indicates how central a node is in a network. Nodes with high betweenness centrality help the speed of transfer of information through networks as they help create the shortest distance between other nodes in the network. Nodes with high betweenness centrality are coloured red, a medium betweenness centrality is yellow, and low betweenness centrality is blue. Nodes with intermediary colours (orange, green) represent those that have a betweenness centrality somewhere between low and medium or between medium and high. Finally, the colour and size of edges is dictated by ‘weight’. In this example, weight is dictated by the frequency of tweets that exist between nodes. Thick red edges between nodes represent nodes that send tweets to each other frequently, or one node mentions another frequently. Thin blue edges represent low frequency mentioning relationships. Yellow are medium. Again, blended colours represent intermediary frequencies and thus, in this case, weight.

CL2015 tweets

The tweets network shows that @CorpusSocialSci was – perhaps unsurprisingly – the most prolific and central account in the #CL2015 network. It had the most connections and joined the most individual accounts together. But other users were very active in helping to disseminate information more widely, which are shown by those nodes in yellow and orange. The accounts on the periphery of the network are good examples of ambient affiliation. They use #CL2015 to affiliate but do not directly engage with others by mentioning other users. Moreover, the gif attempts to show the evolution and growth of the network over time but also shows that each day new topics and networks of interaction relating to those new topics emerged daily. As talks (and news of talks in the network) became topical, people tweeted and shared ideas and notes relevant to those talks. An example of this is the emergence of fireant on 21/07/2015. When introduced to delegates, an ad hoc online discourse community formed to spread the news of a new tool, add new information and to channel their enthusiasm back to source.

User Date Tweet
RachelleVessey 2015-07-21 16:50:43 Excellent end to the first day of #CL2015- FireAnt looks like a fantastic programme @antlabjp @DrClaireH can’t wait to try it out!
SLGlaas 2015-07-21 16:54:13 Stupidly excited about #Fireant from @antlabjp  #CL2015
CorpusSocialSci 2015-07-21 16:54:43 Everyone is eagerly wondering when FireAnt will be available. @antlabjp’s answer is hopefully within the next few months. #CL2015
Rosie_Knight 2015-07-21 16:56:40 Amazing talk about FireAnt- can’t wait to use this on my #HeForShe data! @DrClaireH @antlabjp @Mark_McGlashan #CL2015

CL2015 retweets

The retweets network again shows that @CorpusSocialSci was – and, again perhaps unsurprisingly – at the centre of #CL2015 retweeting activity. The retweet network gif shows 2 discrete networks. The right hand network shows activity at the CL conference, the left hand network shows the retweeting behaviour of the Latin American Youth Leadership programme mentioned above. Avid conference tweeters may have noticed when keeping track of the #CL2015 hashtag. The left hand network – a graphic representation of the most retweeted tweet containing #CL2015 shown above – shows 218 users retweeting a single central account. In this network there is no interaction between the users engaged in retweeting this user. This kind of network formation is extremely typical of users retweeting news stories on Twitter. The right hand network, however, shows a great deal of mutual retweeting, whereby users are engaged on a prolonged basis in sharing each others’ tweets and forming a network of sharing and resharing.


Integrating methods from CL and SNA offers some really interesting possibilities for the analysis of large amounts of social data. Here, I have used keyword analysis to find the most salient topics for each day of the conference, used those topics to find and visualise small but coherent discourse communities, and situated those communities within the wider #CL2015 social network.


CASS PhD student in Moscow to attend the XVI April International Academic Conference on Economic and Social Development

I recently got the opportunity to travel to Moscow to attend the XVI April International Academic Conference on Economic and Social Development at the National Research University – Higher School of Economics (HSE). This conference covered a wide variety of fields including Sociology, Geography, and Technology, and, on the last day of the conference, there was a seminar specifically for Linguistics PhD students. The aim of this seminar was to allow students from Russia and other countries to exchange ideas, and to introduce students from around the world to HSE.

At the seminar, there were presentations from 10 PhD students and these covered a variety of Linguistics topics including Grammar, Semantics, Sign Language, and Cognitive Linguistics. There were also some presentations on Corpus Linguistics: one which discussed semantic role labelling for the Russian language based on the Russian FrameBank, and another which discussed building a corpus of Soviet poetry. I found it interesting to see corpus analyses based on the Russian language, and it was also interesting to see the use of the ‘web as corpus’. This introduced me to tools that I haven’t used before, such as the Google N-Gram Viewer.

In the afternoon, I gave a presentation entitled The collocation hypothesis: Evidence from self-paced reading. This was the first time I had ever given a conference presentation and I was really pleased to have an audience that seemed interested in my work. The audience was composed of PhD students, some undergraduate students from the Linguistics Department at HSE, researchers from other fields who had presented at the conference on the previous days, as well as a few senior academics who gave me some really useful feedback.

The conference was held at the central building of HSE and, the day before the seminar, an MA student in Computational Linguistics kindly gave me a tour of the Linguistics Department. It was interesting to see that their classes are all seminar-based and I particularly liked the way they had a common room where all members of the department, including undergraduates, postgraduates, and lecturers, go between classes in order to socialise or do work. Here, I got the chance to speak to some undergraduates and postgraduates and I was shown some of the corpora that were compiled at that department, such as the Corpus of Modern Yiddish, the Bashkir Poetic Corpus, and the Russian Learner Corpus of Academic Writing. I was also told about a project called Tolstoy Digital, which involved making a corpus of Tolstoy’s works. It was interesting to hear about the unique problems that were faced when compiling this corpus. For instance, Tolstoy used an older orthography so this had to be translated to the modern form before the corpus could be tagged and parsed.

When speaking to members of the department, it was also interesting to discuss how some of their work links to some of the work carried out at CASS and the Linguistics Department at Lancaster University. For example, Elena Semino’s work on pain questionnaires seemed to link closely to an article written by members of HSE entitled Towards a typology of pain predicates (Reznikova et al. 2012). This article discusses the way in which the semantic domain of pain is largely composed of words borrowed from other semantic domains.

After showing me around the department, the MA student, Natalia, showed me around some of the main sights in central Moscow. I really appreciated this as I got to see some of Moscow from a local’s perspective as well as getting to visit some of the key sights that I was looking forward to seeing such as the Bolshoi Theatre. Whilst in Moscow, I also went to see Swan Lake at the Kremlin Theatre of Classical Russian Ballet. This was an amazing experience because I had always wanted to see a Russian ballet and, although I had already seen Swan Lake several times, this was definitely the best version I had ever seen. Overall I had a brilliant time in Moscow and I am really grateful for the Higher School of Economics for funding and organising the trip.

CASS affiliated papers to be given at the upcoming 5th International Language in the Media Conference

In two weeks, several scholars affiliated with the Centre will be heading south to attend the 5th International Language in the Media Conference, taking place this year at Queen Mary, University of London. We are particularly excited about the theme — “Redefining journalism: Participation, practice, change” — as well as the conference’s continued prioritization of papers on “language and class, dis/ability, race/ethnicity, gender/sexuality and age; political discourse, commerce and global capitalism” (among other important themes). As a taster for those of you who will be joining us in London and an overview for those who are unfortunately unable to make it this year, abstracts of the CASS affiliated papers to be given at the conference are reproduced below.

“I hate that tranny look”: a corpus-based analysis of the representation of trans people in the national UK press

Paul Baker

In early 2013, two high-profile incidents involving press representation of trans people resulted in claims that the British press were transphobic. For example, Jane Fae wrote in The Independent, that ‘the trans community… is now a stand-in for various minorities… and a useful whipping girl for the national press… trans stories are only of interest when trans folk star as villains” (1/13/13). This paper examines Fae’s claims by using methods from corpus linguistics in order to identify the most frequent and salient representations of trans people in the national UK press. Corpus approaches use computational tools as an aid in human research, offering a good balance between quantitative and qualitative analyses, My analysis is based upon previous corpus-based research where I have examined the construction of gay people, refugees and asylum seekers and Muslims in similar contexts.

Using a 660,000 word corpus of news articles about trans people published in 2012, I employ concordancing techniques to examine collocates and discourse prosodies of terms like transgender, transsexual and tranny, in order to identify repetitive patterns of representation that occur across newspapers. I compare such patterns to sets of guidelines on language use by groups like The Beaumont Society, and discuss how certain representations can be enabled by the Press Complaints Commissions Code of Practice. While the analysis found that there are very different patterns of representation around the three labels under investigation, all of them showed a general preference for negative representations, with occasional glimpses of more positive journalism.

“I think we’d rather be called survivors”: A corpus-based critical discourse analysis of the semantic preferences of referential strategies in Hurricane Katrina news articles as indicators of ideology

Amanda Potts

In times of great crisis, people often rely upon the discourse of powerful institutions to help frame experiences and reinforce established ideologies (van Dijk 1985). Selection of referential strategies in such discourses can reveal much about our society; for instance, some words have the power to comfort addressees but further oppress the referents. Taking a corpus-based critical discourse analytical approach, in this paper I explore the discursive cues of underlying ideology (of both the publications and perhaps the assumed audience) with special attention on journalists’ referential and predicational strategies (Reisgl and Wodak 2000). Analysis is based on a custom-compiled 36.7-million-word corpus of American news print articles concerning Hurricane Katrina.

A variety of forms of reference have been identified in the corpus using part-of-speech tagged word lists. Collocates of each form of reference have been calculated and automatically assigned a semantic tag by the UCREL USAS tagger (Archer et al. 2002). Semantic categories represented by the highest proportion of collocates overall have been identified as the most salient indicators of ideology.

The semantic preferences of the referential strategies are found to be quite distinct. For instance, resident prefers the M: Movement semantic category, whereas collocates of evacuee tend to fall under N: Numbers. This may prime readers to interpret Gulf residents and evacuees as large, threatening, ‘invading’ masses (often in conjunction with negative water metaphors such as flood). The highest collocate semantic category for victim, displaced, and survivor is S: Social actions, states and processes, indicating that the [social] experiences of these referents—such as being helped or stranded, or linked to social identifies such as wife—are foregrounded rather than their numbers or movement.

Finally, the plummeting frequency of refugee following a unique debate in the media over the word’s meaning and even its semantic preference will also be discussed as an illustrative example of how unconscious language patterns can sometimes come to the fore in contested usage and influence the journalistic lexicon. Following from this, a more considered use of referential strategies is recommended, particularly in the media, where this could encourage heightened compassion for- and understanding of those gravely affected by catastrophic events.

Journalism through the Guardian’s goggles

Anna Marchi

‘Journalism is an intensely reflexive occupation, which constantly talks to and about itself’ (Aldridge and Evetts 2003: 560). Journalists create interpretative communities (Zelizer 2004) through the discourses they circulate about their profession, the meaning and role of journalism are constituted through daily performance (Matheson 2003) and can be studied by means of the self-reflexive traces in texts. That is, they can be detected and studied in a newspaper corpus.

This paper proposes a corpus-assisted discourse analysis (Partington 2009) of the ways journalists represent their trade in their own news-work. The focus of the research in one newspaper in particular: the Guardian. Previous research (Marchi and Taylor 2009) suggested that among British broadsheets the Guardian is by far the most interested in other media, as well as the most inclined to talk about itself. Using newspaper data from 2005, a particularly relevant year in the newspaper’s biography (it changed format from traditional broadsheet to berliner) and rich with self-reflexivity, I examine the discursive behavior of media-related lexical items in the corpus (such as journalist, reporter, hack, media, newspaper, press, tabloid) exploring the ways in which the Guardian conceptualises the role of the news media, how it represents professional values and the divide between good and bad journalism, and, ultimately, how it constructs its own identity. The study relies on the typical tools of corpus linguistics research – collocation analysis, keywords analysis, concordance analysis – and aims to a comprehensive description of the data, following the principle of total accountability (McEnery and Hardie 2012: 17), while keeping track of the broader extralinguistic context. From a methodological point of view this work encourages interdisciplinary contamination and a serendipitous approach to the data and wishes to offer an example of how corpus-based research can contribute to the academic investigation of journalism across disciplines.

Visit the conference website for more details, including a list of plenary speakers.