The Scottish referendum – did it unite the Guardian and the Mail?

The Guardian and the Mail are very different newspapers. The Guardian is a left-leaning liberal broadsheet while the Mail is a more popular right-leaning ‘middle-market’ newspaper. Generally, they can be relied on to disagree with one another on a range of social, economic and political issues. However, both newspapers supported the recent “No” campaign during the Scottish Independence referendum, which raises a few interesting questions – how did their discourse around Scottish independence contrast? Did they use similar arguments and language, or did they still manage to retain their individual identities?

To explore these questions, we built corpora of the Mail and Guardian (and their Sunday editions) from 18 June 2014 until 18 September 2014 (the three months leading up to the referendum on Scottish independence) by collecting all articles which contained the term Scottish directly followed by independence, referendum, vote or poll.

We then examined the keywords which emerged when each corpus of articles was compared against the 1 million word BE06 Corpus of general British English. A keyword is simply a word which occurs much more often in a corpus when compared against a larger reference corpus. Corpus tools (we used Antconc) can quickly calculate keywords by conducting statistical tests on all the words in the corpus. We looked at the strongest (in terms of statistical saliency) 100 or so keywords for each corpus, and then compared the two sets of keywords to see which occurred just in the Guardian or just in the Mail, but also which were shared by both. The table below shows the keywords that were found.

Guardian Keywords Keywords in both newspapers Mail Keywords
austerity, Britain, Brown, campaigners, Carrell, country, devolution, EU, festival, Holyrood, ISIS, nation, nationalism, north, oil, political, politicians, politics, polling, polls, powers, Saturday, says, secretary, Severin, voted, votes, voting, weather, YouGov Alex, Alistair, all, August, bank, BBC, better, border, Cameron, campaign, currency, Darling, David, debate, Ed, Edinburgh, election, former, Games, Glasgow, has, independence, independent, July, Labour, leader, London, Miliband, minister, MPs, nationalists, No, party, poll, prime, pro, referendum, Salmond, Scotland, Scots, Scottish, September, SNP, tax, Thursday, Together, Tory, UK, undecided, union, vote, voters, Westminster, will, would, Yes Balmoral, border, cabinet, CBI, chairman, crisis, investors, James, Kingdom, MP, PM, prince, Queen, said, shares, sterling, Tories, Tuesday, twitter, uncertainty, United, warned, week, year,


This table isn’t really an analysis though – we need to explore the keywords in more detail by reading the articles that each keyword appears in and getting a sense for how and why they were used. This is achieved by looking at concordance lines, although we can also expand each line to read the entire article. Here are some of our preliminary findings.

The Mail was much more concerned than the Guardian about how the vote would impact on the Royal Family, with its keywords including Prince, Queen and Balmoral. Much is made of the queen’s ‘neutrality’, her relationship with David Cameron, her ‘soft power’ in influencing the vote, her carefully calculated comments, and characteristically, what she is wearing (“a turquoise outfit and hat” in one article). The Queen is also described as receiving daily updates from Balmoral.

The Mail also refers to the keyword uncertainty a lot more than the Guardian, particularly appearing concerned about how the progress of the campaign is bad for markets, investors, businesses and pension holders who don’t like uncertainty e.g. “uncertainty is the enemy of investment’. The use of the Mail keyword crisis also pins the Scottish vote to the idea of a crisis – the vote could trigger an “EMU-style currency crisis within the UK” but there could also be a “leadership crisis” for both Labour and the Conservatives. Another somewhat worrying Mail keyword is warned, with the Mail reporting various people and businesses (Stagecoach, Paul Krugman, Goldman Sachs, Standard Life, John Major, Mark Carney, Doug Flint) issuing warnings about a range of dire consequences that could occur if Scotland gains independence.

Perhaps surprisingly, twitter is a keyword for the Mail, which is interesting given the editor of the Mail, Paul Dacre’s dismissal of the ‘firestorm’ of tweets around a previous Mail article by Jan Moir which attracted the highest number of complaints to the Press Complaints Commission ever back in 2009.[1] But the Mail now seems to have accepted the importance of Twitter and views tweets as newsworthy. To wit, it reports on Rupert Murdoch’s twitter behaviour, as well as tweets from people who disliked the Better Together advertising campaign #PatronisingBTLady. The Mail is especially disapproving of “tartan trolls” who use Twitter to attack celebrities like JK Rowling who endorse the Yes vote.

How about the Guardian? One keyword it used was nationalism, which at first glance may appear that the Guardian wished to critique the “Yes” voters as nationalists. However, there were cases were writers like Billy Bragg and George Monbiot argued that the label of nationalism was unfairly used to obscure ‘self determination’. One journalist approvingly refers to the lack of ‘braveheart nationalism’ in the campaign, although other journalists do attribute nationalism to some Scottish people, but this is felt to be due to London being out of touch and inward looking. Nationalism either doesn’t exist in the campaign, or when it does, can be excused.

Another Guardian keyword is austerity, with some journalists citing views that the current government’s austerity program being blamed as helping the yes camp. This could be an opportunity for the Guardian to blame the government’s economic policy for breaking up the union, but generally this is not done and instead, it is argued that a Yes vote would not end austerity, but merely impose it from Holyrood rather than Westminster.

Unlike the Mail, the Guardian doesn’t spend as much time reporting the warnings of ‘financial experts’, although the keyword oil was interesting, occurring with reference to North Sea Oil reserves and revenues. In a number of articles, the Guardian foregrounds claims by Sir Ian Wood that Alex Salmond has exaggerated North Sea Oil reserves by up to 60%. In terms of perspectivation, Sir Ian Wood’s position is given precedence over Salmond’s e.g. Wood is described as ‘one of the most influential figures in the Scottish oil industry’ and other people are described as quoting his position too. A woman who claims that the No campaigners have ‘downplayed the amount of oil we have left’ is subtly positioned as greedy: ‘It was “our oil”, she said…’ and thus her argument is weakened somewhat. At the end of the same article, another opinion, given by a local Lib Dem chairman who is described as a ‘marine engineer’ appears to be given more precedence: he says ‘Nobody knows how much oil is there’. The Guardian may not know how much oil there is, but it manages to do a good job of casting enough seeds of doubt to make us think that neither does Alex Salmond.

Finally, both newspapers had Yes as a keyword. How did they represent the yes campaigners? The Guardian made reference to yes voters who are starry-eyed, fierce, enterprising, determined, hardline, vocal and proud. It has very little to say about the no voters, indicating a somewhat subtle sense that the yes voters are a little pushy in their sentiments. The Mail doesn’t mention characteristics of the yes voters much, although it does refer to Alex Salmond as shouty and describes the no campaign as floundering and lacklustre.

So, while both newspapers generally supported Scotland staying within the UK, they each did it by using different strategies and in a way which helped them to maintain their own identities, reflecting the concerns and interests of their readers. From this admittedly preliminary analysis it is difficult to make a confident conclusion but the Guardian did appear to make more of an effort to allow a range of positions to be represented, and was somewhat more subtle in its disapproval of the ‘yes’ campaign. The two newspapers did have different strategies on what they said about each other in respect to the campaigning. The Mail barely mentioned the Guardian, only referring a couple of times to a Guardian poll that put Alistair Darling as scoring a victory over Alex Salmond during a two hour debate. The Guardian was more critical of the Mail, however, using the campaigning to get in a few digs at the Mail. One writer sneeringly referred to ‘the Daily Mail’s insistence that anyone who wants to see a fairer society must be a Stalinist’ And another Guardian columnist expressed surprise that ‘I’m on the same side as the Daily Mail too! Which appears to be taking a short break from convincing us the UK has gone down the tubes to press home a slightly perplexing message of: hey, please don’t break up this wonderful hideous slutty drunken immoral country where women, gays and foreigners don’t know their place!’

Now the vote is over, the two newspapers can get back in their respective bunkers.


Swimming in the deep end of the Spoken BNC2014 media frenzy

As someone who enjoys acting in his spare time, I’m rarely afraid of the chance spend some time in the spotlight. But as I sat one morning a few weeks ago in my bedroom, in nothing but a dressing gown, about to do a live interview on a national Irish radio station, with no kind of media training or experience under my belt, I really did get a case of the nerves. I would spend the entire day appearing on over a dozen radio and TV broadcasts (thankfully with time to get dressed after the first), promoting participation in the Spoken BNC2014 project, and finding out the true meaning of the phrase ‘learning on the job’. My experiences taught me a few things about the relationship between the broadcast media and academic research, which I’ve summarised at the end of this blog.

In late July, CASS and Cambridge University Press announced a new collaboration which aims to compile a new spoken British National Corpus, known as the Spoken BNC2014. This is an ambitious project that requires contributions of recordings from hundreds, if not thousands, of speakers from across the entire United Kingdom. As a research team (which includes Lancaster’s Professor Tony McEnery, Cambridge’s cdembry(Replace this parenthesis with the @ sign)">Dr Claire Dembry, as well as Dr Vaclav Brezina, Dr Andrew Hardie, and me), we knew that we had to spread the word far and wide in order to drum up the participation of speakers across the country.

So, at the end of August, we put out a press release which teased some preliminary observations, and invited people to get involved by emailing corpus(Replace this parenthesis with the @ sign) These findings were based on some basic comparisons between the relative frequencies of the words in the demographic section of the original spoken BNC, and those of the first two million words collected for the Spoken BNC2014 project. We put out lists of the top ten words which had fallen and risen in relative frequency the most drastically between the 1990s data and today’s data.


Words which had declined Words which had risen
fortnight facebook
marvellous internet
fetch website
walkman awesome
poll email
catalogue google
pussy cat smartphone
marmalade iphone
drawers essentially
cheerio treadmill


It seems that these words really captured the imagination of the media powers that be. On the week of the release at the end of August, I was told on the Monday afternoon that the release had been sent out. By late that night, the story had already been picked up by the Daily Mail. Such was my joy, and perhaps naivety, that I sent out a brief and fairly humble blog post celebrating the fact that one person from one newspaper had run an article on our story. What I didn’t realise at the time was that, had I put out a blog post every time we discovered a piece of coverage the next day, I would still be writing them now.

The next morning I was woken by a message from Lancaster Linguistics and English Language department’s resident media celebrity, Dr Claire Hardaker, asking urgently for some information about the Spoken BNC2014 project. She had been contacted by LBC Radio, who had caught wind of the story and assumed sort-of-understandably that, since it was a linguistics story that involved Lancaster University, Claire would be directly involved. She isn’t, sadly, but they had lined up a live interview with her in twenty minutes’ time regardless, and she had kindly agreed to do it anyway with what information I could get to her in time.

After that, I soon realised that perhaps this story would garner more interest than a few newspaper articles. My phone went into melt-down, bleeping with emails from the PR team at the university and phone calls from unknown numbers. There was a 90 minute period where I couldn’t leave my room to get a shower, get dressed, and get on to the campus, simply because I was being lined up for so many interviews throughout the day. As such, I had to do my first there and then, in my dressing gown, while Claire Hardaker kindly waited on stand-by in the university press office in case I couldn’t make it to campus on time for my next.

Once I got there, it was a busy day of interviews right through to 6pm that evening. Over the course of the day, I was interviewed by international radio stations BBC World Service and Talk Radio Europe, UK national stations BBC Radio 4, Sky Radio, and Classic FM, Irish national station Today FM, and Russian national station Voice of Russia UK. I was also interviewed by UK regional BBC news stations London, Merseyside, Coventry & Warwick, Lancashire, and Three Counties. The highlight for me though was the TV interview with the Sky News channel, which I recorded using the Skype app on my little Windows tablet. The interviewer could see me, but I couldn’t see her (or indeed hear her all that well), and I had no idea that she was set up in the studio and that the video would be edited together and released that day. Aside from being shown on the Sky News television channel itself, and their website, the interview appeared on upwards of 40 regional radio websites, including Rock FM, Magic FM, The Bee, North Sound, Yorkshire Coast Radio, Wave 965, and Juice Brighton, as well as other media sites. Claire Dembry also got involved from Cambridge, doing further TV interviews with Sky News and even joining me for a live double interview with BBC Radio London.

So, what did I ‘learn on the job’ through my baptism of fire in the media world? Three main points:

  • Some interviewers thought I was announcing the death of the English language

Though most of the interviews went about as smoothly as I could have expected, with me remembering to plug the email address corpus(Replace this parenthesis with the @ sign) at any given opportunity, some were much harder work. Some interviewers seemed horrified at the thought of ‘losing’ words such as marvellous and cheerio, and wanted me to tell them what they could do to help rescue them. Though it was tempting to say “well if you keep saying them they won’t disappear…”, I instead politely made the point that language, like everything else to do with being human, changes over time, and that this is perfectly okay. Just like fashion. This ‘endangered species’ discourse came about in a few interviews, and it seemed that the interviewers felt I was suggesting that the English language was somehow shrinking or degrading over time.

  • Some interviewers thought I was actively promoting the changes I was reporting

In other cases, the interviewers seemed to imply that I was making recommendations for the words that speakers should avoid or should start saying more, in order to ‘stay up to date’ and not come across ‘old fashioned’. In other words, I was mistaken for a prescriptivist rather than a descriptivist, who was trying to stop people from using the word catalogue, or encouraging everybody to say the word treadmill at least five times a day.

  • Some interviewers asked ‘nice’ questions, and some didn’t

This is a more general observation which I suspected to be the case before I started, and had it confirmed as the interviews went on. It is a simple truth that the interviewers who ‘got’ the project the most were the ones who, for me, asked the best questions. When being interviewed about the list of words which have decreased in frequency I was, in varying forms and among many others, asked the following two types of question:

A: The words which were more popular in the 1990s but not so much now – tell me about ‘pussy cat’ – what’s going on there?

B: The words which were as popular in the 1990s as Facebook is now – I guess words like ‘marvellous’ and ‘catalogue’ are harder to spell and we’re getting lazier these days so we’re just going to say shorter words aren’t we?

For me, and I imagine many others, question A is the ‘nice’ question of this pair. The interviewer draws me to one example which looks interesting – fair enough – but importantly they make no inference themselves about the possible explanation. They set up a blank canvas and allow me to paint it in the way which is most advantageous to my purpose.

Question B, however, is much more problematic for me as the interviewee and sadly occurred as much, if not more, than those like question A. Firstly the interviewer has re-conceptualised the findings and created equivalence between the frequency of the declining words and the words on the rise. Therefore the possibility for conclusions like “marmalade used to be as popular as Facebook” or, worse, “iPhones replace pussy cats in British society” are opened up and thrown into the ether.

Secondly, and much harder to deal with immediately, is the lumping of two completely unrelated words (marvellous and catalogue), the assumption of societal degradation (we’re getting lazier), the pseudo-logical causal relationship between written conventions and spoken interaction (harder to spell), which are based on such assumptions of societal degradation (so we’re just going to say shorter words), and, the icing on the cake, the tag question which invites me to agree that everything the interviewer has just said is perfectly correct (aren’t we?). Yes, this is indeed not a nice question. The strategy I developed is to say that yes, everything you have just said could be the case, and then to go about repackaging their question into something more reasonable for me to say anything about. This was not easy and in some cases I did this better than others!


The recurring theme of my experience was the extent to which the interviewers’ expectations of the Spoken BNC2014 research matched what we are actually trying to do. Most of the time, there was a close match and the questions fit my aims well. In the cases where this didn’t happen, and the questions made all sorts of false assumptions, life was more difficult. I don’t think, however, that anyone was deliberately misconstruing our humble aims, and really I’d rather have given those difficult interviews, where I felt like I was in a fight for mutual understanding, than not to have given them at all for fear of being misunderstood. It seems that this is an inevitable aspect of daring to throw your work out of the bubble of academia and into the public sphere, where it really matters. My goal for next time is to improve the way that the research is communicated in the first place, and to plug potential potholes of misunderstanding in a way that is as accurate as reasonable but still makes a good story.

Overall, I think I managed as well as I could have done, given the abrupt start to the day and my naïve expectation that the press wouldn’t be as interested in the story as it turns out they were. Hopefully we’ll have generated lots of interest in the project. I’d like to thank Claire Hardaker for helping me learn the ropes as I went along, the staff at Lancaster University’s press office for keeping me in the right place at the right time, and the ESRC, who have since offered me some media training, which I will very gladly accept. Awesome!

Corpus linguistics MOOC: Second run beginning soon

We are running the corpus MOOC again – and we are really looking forward to it. In the first run of the course we taught social scientists and other researchers from across the globe about how to use corpus linguistics to study language. We looked at a range of topics of contemporary social relevance in doing so – including how we talk about disability and how newspapers write about refugees. We also looked at key areas where corpus linguistics has contributed greatly, notably the areas of dictionary construction and language teaching.

The result, I must say, exceeded our expectations – which were pretty high. People really seemed to like the course and get a lot from it. Even though the approach was entirely new to most students, a very large number worked through all eight weeks of the course. The feedback on our training has been exceptionally strong – a look at the #corpusMOOC hashtag on Twitter will give a good idea of the overwhelmingly positive response to that course. The following quote, from a Chinese notice board on which our MOOC was discussed, gives a strong sense of how the course succeeded both in training students and in showing them that corpora have a key role to play in exploring social science questions (thanks to Richard Xiao for the translation):

“CorpusMOOC, with its assembly of the best corpus linguists and rich content, cannot be praised enough … The greatest benefit for me has been that the course has widened my vision: corpus linguistics and the applications of corpus technologies have gone far beyond what I had imagined – more resembling big data in the field of social science research instead of being confined to linguistics… I think the significance of this course lies not merely in teaching a large number of corpus techniques but more, rather, in introducing corpora and demonstrating what corpora can be used for, thus making us aware of them and helping us understand their importance … the corpus-based approach is the unavoidable approach to language in future.”

The first run of the MOOC had a great impact – the course was taken mainly by women (70.44% of students), and drew participants from all continents and a wide range of countries – including places as far flung as the British Antarctic Territory! The areas in which course participants were working and researching were heavily oriented to the social sciences, with students drawn from areas such as business consulting and management, health and social care and media and publishing. The greatest contribution of the course, however, seems to have come from providing training to teachers/lecturers in the UK and beyond. Given that the great majority of students were taking the course for career development (78.59%), the course was likely not only to have had a strong effect on this group but also, by extension, on the students who are exposed to the ideas in the course by the teachers/lecturers who took it.

Having read this, you can probably understand why we were keen to run the course again. Through it we have been able to get a good understanding of corpus linguistics across to thousands of people around the globe. We have made a few changes to the course based on the feedback we received – all designed to make a good course better! This includes new lectures (for example on the language used in cancer treatment) and new in conversation pieces with corpus linguists (such as Douglas Biber).

If this run of the course proves as popular as the first, which we think it should, we plan to run the course every September. Who knows when we will stop!

For a limited time, registration is still open. Book your place on ‘Corpus linguistics: method, analysis, interpretation’ now. 

Introducing the Corpus of Translational English (COTE)

We are pleased to announce that CASS has recently compiled another new corpus, the Corpus of Translational English (COTE). The construction of COTE is supported by the joint ESRC (UK) – RGC (Hong Kong) research project, “Comparable and Parallel Corpus Approaches to the Third Code: English and Chinese Perspectives” (ES/K010107/1). The project is led by Dr Richard Xiao and Dr Andrew Hardie at CASS in collaboration with Dr Dechao Li and Professor Chu-Ren Huang of the Hong Kong Polytechnic University.

COTE is a one-million-word balanced comparable corpus of translated English texts, which is designed as a translational counterpart of the Freiburg–LOB Corpus of British English (F-LOB). The new corpus is intended to match F-LOB as closely as possible in size and composition, but is supposed to represent translational English published in the 1990s. Like the F-LOB corpus, COTE comprises five hundred text samples of around 2,000 words each, which are distributed across 15 text categories. The corpus is created with the explicit aim of providing a reliable empirical basis for identifying the typical common features of translated English texts and investigating variations in such features across different types of text on the basis of quantitative analyses of the balanced corpus of translational English in contrast with comparable corpora of native English.

Like many balanced native English corpora such as F-LOB, COTE includes metadata information such as text type and date of publication as well as linguistic annotation such as part-of-speech tagging. But as a translational English corpus, COTE additionally includes various translation-specific metadata, e.g. the source language, translator, date and source of publication in the header of each text sample, which makes it possible to categorize the texts to suit different research purposes. The corpus is currently restricted for in-house use by the project team. It will be released and made accessible online when the project is completed.

Related outputs:

Hu, X.  (2014) Does the Style of Translation Exist? A corpus-based Multidimensional Analysis of the stylistic features of the translated Chinese. Paper presented at the 2nd Second Asia Pacific Corpus Linguistics Conference. 7 – 9 March, the Hong Kong Polytechnic University.

Hu, X. & Xiao, R. (2014). How different is English translation from native writings of English? A multi-feature statistical model for linguistic variation analysis. Paper presented at the 35th ICAME conference. 30 April to 4 May, the University of Nottingham.

Hu, X. & Xiao, R. (2014). What role do Source Languages play in the variation of translational English? A corpus-based survey of Source Language interference. Paper presented at the 7th IVACS conference, 19-21 June 2014, Newcastle University.

Xiao, R. & Hu, X.  (2014). General tendencies and variations of translational English across registers. Paper presented at the 4th UCCTS conference, 24-26 July 2014, Lancaster University.

McEnery, A. & Xiao, R. (2014). The development of corpus linguistics in English and Chinese contexts. In Ishikawa, S. (ed.) Learner Corpus studies in Asia and the World: Papers from LCSAW2014, Vol. 2, pp. 7-45. Kobe, Japan: Kobe University.

Hu, X., Xiao, R. & Hardie, A. (under preparation). How do English translations differ from native English writings? A multi-feature statistical model for linguistic variation analysis.

Latest news on the CASS/iCourts collaborative investigation into the language of the law

Earlier this year, a formal collaboration between iCourts and CASS was signed based on our centres’ joint interest in the corpus-based investigation of language in the context of law. We are motivated to analyse legal data linguistically, because law is practiced in language, legal judgements are texts, legal arguments are phrases in texts, and legal concepts are expressed in words. One primary argument against analysing legal language from a linguistic perspective is that the data tend to be extremely formulaic and objective. However, findings from our collaborative analyses have shown that legal language shows elements of both fixedness and variation. Both sorts of patterns were exposed using corpus-based critical approaches to language.


Sigrun Larsen (Dept. of Law, Lancaster University), Matt Fisher (Tripod Software), Ioannis Panagis (iCourts, University of Copenhagen), Anne Lise Kjær (iCourts, University of Copenhagen), Amanda Potts (CASS, Lancaster University), Tony McEnery (CASS, Lancaster University), Henrik Stampe Lund (iCourts, University of Copenhagen), Paul Rayson (CASS, Lancaster University), Laurence Anthony (CASS visitor, Waseda University)

On our first collaborative project, “Decoding the rule of law: Corpus-based discourse analysis of the construction of achievements of the International Criminal Tribunal for the Former Yugoslavia (ICTY)”, I serve as P.I., collaborating with C.I. Anne Lise Kjær of iCourts. This month, I traveled to Copenhagen to spend 1.5 intensive weeks working at the University of Copenhagen. I arrived prepared to work with two corpora that had previously been collected and cleansed with the help of Matt Fisher (Tripod) and Ioannis Panagis (iCourts): 1) All of the trials and appeals published thus far by the ICTY (10.5 million words); and 2) Annual reports published by the ICTY from 1994-2013 (425,000 words).

In the use of frequency lists, (contrastive) collocation analysis, n-gram description, and key semantic domain analysis, we have demonstrated the ways in which legal language remains rigid and fixed, and also described instances in which variation occurs. Because trials (and, to a lesser extent, appeals) are intended to be self-contained documents, we have also been able to trace problematisation in variations of legal language, which led to confusion in the court, and increased time and money spent in search of justice.

Analysis on the first phase of our project is now complete, and initial results are being disseminated. I presented findings with my collaborator Anne Lise Kjær last week at the fifth international conference for Critical Approaches to Discourse Analysis across Disciplines (CADAAD) at ELTE (Loránd Eötvös University) in Budapest, Hungary. A paper outlining our recommendations for corpus-based critical analyses of legal language and featuring detailed findings of this initial study is in the final stages of preparation, and will be available next year.

Notes from the SILK Road International Summer School

In July 2014, I and four other students from the Faculty of Arts and Social Sciences (FASS) at Lancaster University (Sophie Barker, James Lester, Eleanor Richards-Johnson, and Gillian Smith) travelled to Hong Kong to attend the SILK Road International Summer School.


The three week summer school, organised by Hong Kong Polytechnic University (PolyU), in affiliation with Xi’an Jiaotong University (XJTU), was attended by students from countries all over the world, including Hong Kong, mainland China, the United Kingdom, Australia, the United States of America, Thailand, and South Korea. Its aim was to encourage students to “Study in and Intercultural environment and Learn to be Kreative” (SILK), and this was made possible by hosting an internationally diverse cohort of students. At the helm of the summer school was Lancaster University Linguistics alumni Dr Xu Xunfeng, who accompanied us for the entire duration of the course.

We took two out of a choice of four credit-bearing university modules. These courses, usually delivered across an entire term, were adapted to be taught intensively. As such, we received eight hours of contact time every Monday, Wednesday, and Friday, and were required to prepare readings and assignments in between classes. The courses offered were:

The first week took place at PolyU in Hong Kong, where we were housed in PolyU’s student accommodation. The second and third weeks were hosted at XJTU in mainland China, where we were accompanied and taught largely by the same staff from PolyU, and stayed in a hotel. Each module differed in terms of assessment style, but they all concluded with group presentations on the final day of contact time, which consolidated some aspect of the learning experience. At the end of the course, we returned to Hong Kong for one more night before travelling home.

In addition to taking classes, we were taken on day trips every Tuesday, Thursday, and Saturday, to a series of cultural sites both in Hong Kong and Xi’an. These included the Terracotta Army at the Mausoleum of the First Qin Emperor, the Zhongnan Mountains, the Wild Goose Pagodas, and the Tang Dynasty Palace Theatre. In addition there was some free time for us to explore both Hong Kong and Xi’an independently – all in all we certainly had a chance to squeeze in a fair amount of sight-seeing amongst all the studying!

This was the first time that the SILK Road International Summer School had taken place, and it proved to be a valuable, educational, and enjoyable experience for all of us who were lucky enough to be there. The organisers have already announced that the summer school will run again next year, and I hope it is even more successful than this year. I am very grateful to both Hong Kong Polytechnic University and FASS at Lancaster University for funding our trip.

The Spoken BNC2014 project features in the Daily Mail

BNC2014 logoThe recently announced collaboration between Cambridge University Press and CASS, the Spoken BNC2014 project, has made headlines in the Daily Mail.

The article, entitled, “No longer marvellous – now we’re all awesome: Britons are using more American words because traditional English is in decline”, describes the preliminary findings of the project, which is in its early stages.

To participate in the project, native British English speakers from all over the UK can record their conversations and send them to us as MP3 files. For each hour of good quality recordings we receive, along with all associated consent forms and information sheets completed correctly, we will pay £18. Each recording does not have to be 1 hour in length; participants may submit two 30 minute recordings, or three 20 minute recordings, but for each hour in total, they will receive £18.

To register your interest in participating, please email corpus(Replace this parenthesis with the @ sign)

In memory: Professor Geoffrey Leech

It is with great sorrow that we report the death on 19th August of Professor Geoffrey Leech.

Geoff was not only the founder of the UCREL research centre for corpus linguistics at Lancaster University, he was also the first Professor and founding Head of the Department of Linguistics and English Language. His contributions to linguistics – not only in corpus linguistics, but also in English grammar, pragmatics and stylistics – were immense. After his retirement in 2002, he remained an active member of our department, not only continuing his own research but also, characteristically, providing advice, support and encouragement for students and junior colleagues.

All our thoughts are with Geoff’s wife Fanny, and with his family.

It is still hard for us to find the right words at this time. For many of us he was an inspirational teacher and mentor, but for all of us, he was a kind and generous friend.

The video below was recorded by Tony McEnery in conversation with Geoff in late 2013 for Lancaster’s online course in corpus linguistics. In it, Tony and Geoff discuss the history of the field. We present it now publicly as a first tribute to Geoff’s life and work.

(A transcript is available from this link.)

Gypsies, tramps and thieves? UK national newspaper depictions of Romanians and Bulgarians analysed

British tabloid newspapers repeatedly associated Romanians – but not Bulgarians – with criminality and anti-social behavior during 2012-2013, a comprehensive new “big data” report by Oxford University’s Migration Observatory shows.

The report Bulgarians and Romanians in the British national press was undertaken by CASS Challenge Panel Member William Allen and Dora-Olivia Vicol at the Migration Observatory at Oxford University. It provides a detailed analysis of the language used by 19 British national newspapers to discuss Romanians and Bulgarians between December 1st 2012 and December 1st 2013. The analysis encompasses 4,000 articles, letters and comment pieces mentioning Romanians and/or Bulgarians, a total of more than 2.8 million words.

Key findings include:

  • Language used by tabloid newspapers to describe and discuss Romanians as a single group was frequently focused on crime and anti-social behavior (gang, criminal, beggar, thief, squatter). This was less prevalent in broadsheet newspapers.
  • Where Romanians and Bulgarians were discussed together this was consistently in the context of immigration, across both tabloid and broadsheet newspapers.
  • Verbs used to describe or discuss Romanians and Bulgarians together, across both broadsheets and tabloids were frequently related to travel (come, arrive, move, travel, head). In tabloids these included metaphors related to scale (flood, flock).
  • Words appearing before “Romanians and Bulgarians” in both tabloid and broadsheet newspapers were frequently related to prevention of movement (stop, control, block– tabloids) (deter, restrict, dissuade – broadsheets).
  • References to Romanians and Bulgarians together were frequently associated with specific numbers, across both tabloid and broadsheet newspapers. The most common specific numbers were 29 million – the approximate combined populations of Romania and Bulgaria – and 50,000 – a prediction from MigrationWatch, a pressure group which campaigns for reduced immigration, of how many A2 migrants would be added to the UK population each year for five years following the end of transitional controls.

Some language associated with stories unrelated to UK migration was also evident – particularly Romanian abattoirs implicated in the horsemeat scandal and the blonde Bulgarian Roma child who sparked an ‘abduction’ investigation in Greece.

William Allen, co-author of the report said: “The report is valuable because it provides a comprehensive account of how British national newspapers discussed Romanians and Bulgarians during a key period. The language used to describe Romanians – particularly in tabloid newspapers – often mention them alongside criminality and anti-social behaviour, while this was not the case with Bulgarians.” Read the full report here.

How to be a PhD student (by someone who just was), Part 3: Towards the viva

After successfully defending my viva early this year, I’ve been sharing some of the lessons I learned over my 38 months as a PhD student. In this installment, I talk about powering through your final year and preparing for your viva. 

If you missed the previous entries, click through to read Part 1 (Preparing for the programme) or Part 2 (Managing your work and working relationships). 

Coming down the final stretch

When you absolutely can’t stand the sight of your PhD, you know you’re nearly finished with it. From speaking to my friends and colleagues, this tends to happen around 8-10 months before submission, which means that you get about 40 weeks of steely focus, single-mindedly trying to get the demon out of your computer and into the hands of your examiners. This is a testing time for your personal relationships and for your scholarly stamina, but a most excellent time for your academic work.

I’ve yet to meet someone who had the problem of too little material for their PhD (though I suppose they might be out there), so remember ABC: Always Be Cutting. When re-reading your work, keep a sharp eye out for words and phrases such as basically, simply put, in other words, and so on. These are clear indicators that you’ve been repetitive and could be more succinct.

Don’t be afraid to be absolutely ruthless in editing and rewriting, especially in this magical 8-10 month period where you just want it gone. Print out a copy of your research questions and hang them somewhere in sight of your working space. As you finish your analyses and revise your structure, make sure that all words serve the research questions. If you find that your work drifts, you have two choices:

  1. Revise the research questions to match what you researched. It is the worst-kept secret in the academic world that research questions posed in the infancy of a project might not be those we end up answering along the way. This is totally natural. What’s unnatural is if your research questions and chapters/analyses do not evolve together, and your thesis ends up looking more like a centaur than a human or a horse. Pick a human or a horse, and run with it!
  2. Remove analyses that do not directly contribute to the thrust of your thesis. This can be very painful, but is almost always necessary. You do so much work during the PhD that you want to be able to show it all off at the end. But the truth of the matter is: not everything is relevant, and 80,000 words cannot hold the entirety of your own knowledge, let alone the accumulated learnings of the human race. If you find analyses that are clear departures from your research questions, remove these from the main document and save them in a series of new files to turn into papers when you’re ready. Summarise each of these in bullet points, and you can add them into the ‘further work’ section of your thesis, which means that you can still demonstrate that you’ve thought about (and even journeyed toward) new directions in your work. The upside here is that you have a clear path to follow-on publications.

Remove distractions. Be selfish. This is a very short time in your life where it is perfectly fine to just stay the course and keep your eyes on the prize. Surround yourself with understanding, patient, and supportive people. Work each day until you are not being productive anymore, and then relax doing something that is not mentally exhausting but is not mentally destructive. Try your best to stay flexible and (self-) reflexive.


Staying flexible and reflexive

Everyone who starts a PhD is a perfectionist, to some point. We all came to this point (the highest tertiary degree on offer) with a unique mixture of natural talent, intellectual curiosity, mental fortitude, and real hard-headedness. Either you or many of the people in your cohort would have been at the top of their Masters or Bachelors classes, or come from a solid career in industry. The thing about a PhD is that it is designed to be both finite and imperfect.

In the postgraduate socialising area of the linguistics department at Lancaster, we once hung a sign that said, “There are two kinds of PhD: Perfect, and finished”. Choose ‘finished’! The last year of your PhD will break your heart, because that’s when you realise just how much you can do in your finite period, and more devastatingly, how much you just cannot fit in. I can’t remember who told me this, but whoever it was should step forward (because I owe you a drink):

Your PhD is not your great work. If you stay in academia, it is almost certain to be your worst work.

We do this to prove that we can do greater things if given more time, money, chances, collaborators, experiences. If you save all of the interesting things that you can’t fit into your PhD into separate folders, you have a good head start on papers that you can publish either during or directly after your doctorate. You can easily fill up a ‘Future Work’ section in your final chapter. And most importantly, you can finish your PhD.

As soon as I let go of the idea of my thesis as this all-encompassing, nearly-perfect, staggering contribution to science and accepted the fact that it was just the best version of many possible (apprentice) books that I could have written in that time, it just flowed out of me.

For instance, throughout my thesis, I worked on a method of downsampling that could help researchers who were, like me, working with very large corpora resulting in hundreds or thousands of collocates per search node. To make sure that this method was applicable to different data sets, I did two case studies, and I was able to refine the method quite dramatically in the second half of my study. As I wrote up the second half of my PhD, I agonized about the first half, which was completed and written up using the now-outdated, subpar version of the method. “Do I have to go back and redo the entire thing?” I wailed to my long-suffering supervisor. “It will be more perfect if I do”. In his wisdom, my supervisor suggested that I find a way to turn these lemons into lemonade, rather than turning them into 6 months of additional hard labour.

In the end, I presented my PhD warts and all. I was transparent about my ‘research journey’, which my examiners looked upon very favourably. Remember that this process is meant to be hard work; totally whitewashing your PhD by removing all traces of earlier errors, therefore denying yourself the ability to weave in a narrative about the learning experience itself will not do you many favours. Also, including brief notes about where you went wrong, how you identified problems, and what you did to fix them, will help future PhD students immensely. Everyone who opens your thesis afterwards can avoid reinventing the wheel you already sweated over – they can focus on their own unique and novel problems!

Choosing your panel

Choosing the people who will sit on your panel is one of the most crucial decisions of your doctorate. In the UK, we generally have four panels: a pre-confirmation, a confirmation, a post-confirmation, and a viva voce.

The pre-confirmation happens during your first year, and generally checks your progress and working relationship with your supervisor. I suggest choosing an examiner who is (even marginally) in your field and can make some comments about your literature review and some suggestions for possible directions in your work. The most important trait of an examiner in the pre-confirmation (in my opinion) is that they are supportive and kind. Choose someone who will boost your confidence for the road ahead!

The confirmation panel (in my department, taking place in the second year) confirms the movement from PhD student to PhD candidate. This panel is high-stakes, as failing it can mean a significant delay in finishing your PhD, or even discontinuing it completely. Despite this pressure, I recommend choosing the toughest possible person from your department to examine your confirmation panel. For this spot, you want the person most likely to pick holes in your theoretical and methodological choices while there is still time to adjust before the viva. If you choose correctly, your confirmation will be the hardest panel of your PhD – mine certainly was!

The post-confirmation panel happens in the third year of the PhD here, and checks that you have settled on research questions and are on target to submit. Your examiner should be someone quite critical about research questions and design, but also someone who you feel that you can trust and talk to, particularly if you’re encountering issues. This is your last panel before the viva, so it’s a good place to take the temperature of your overall research design and to get a bit of a confidence boost or a reality check.

Finally, we come to the viva. In Lancaster, this happens after 3-4 years of PhD study. I know that some universities don’t give students much control over the members of their panel, but I urge you to have an open dialogue with your supervisor about this. The people sitting in those seats can not only change the outcome of the day, but also have a lasting effect on your career. For my viva, I needed to have three examiners: one internal and two external. (At Lancaster, your supervisor is present during the viva, but cannot speak.) I chose a variety of scholars who have all used corpus linguistic methods in their work, and whose previous findings have been echoed in my thesis. I knew that they would be critical of my work, but would most likely receive it positively. At this point in the process, you would like to engage in a lively debate about your research, but you do not want this to be a negative or a defensive one.

Preparing for the day

This was quite controversial at the time, but I also only told three people (my partner and two very close friends) which day my viva was on. I was freaking myself out enough counting down the days to V-Day; I didn’t want a dozen other friends (as well-intentioned as they might have been) ramping up the pressure by constantly reminding me of the impending panel.

You’ll likely have quite a bit of practice describing your research from speaking to fellow students, scholars, and conference attendees. However, speaking to influential people in your field is much different; it’s a good idea to practice some answers just in case you find yourself freezing up on the day. Here are some questions that could/maybe/will come up in a viva:

  1. Explain your thesis in fewer than 5 sentences.
  2. Explain your thesis for a layman.
  3. What is the one idea that links the entire work together?
  4. What motivated you, personally, to undertake this work?
  5. What do you think the main contribution of this work is?
  6. What was the most crucial decision that you made in designing/structuring/undertaking this work?
  7. Do you think you could have done better work with more data or less data?
  8. How have you, as a researcher, influenced the outcome of this analysis? What safeguards have you put in place against this?
  9. How has the process influenced you? Has your view of the data/circumstances/research topic changed over the course of the degree?
  10. Summarise your major/key findings. Are any of these surprising? Why are they interesting?
  11. Who will find this work most interesting? Do you think it’s accessible to this audience?
  12. Do you have plans for distributing these results to non-academic audiences? What about the contributors/stakeholders?
  13. How would you begin future research?
  14. What sort of advice would you give future PhD students? (Maybe you can write 3 blog posts about it!)
  15. Why do you think that this merits a PhD? (This is the toughest question in the book, and I think it’s only asked in extenuating circumstances, but best to be prepared.)

The best thing that I did to prepare for my viva (personally) was to read through my thesis one last time, with comments and track edits turned on in MS Word. I got a head start correcting typos that were spotted by my examiners, and I was able to add comments expanding on some areas that I thought might fall under their scrutiny. Because I was reading the thesis closely enough to edit it, I really re-familiarised myself with the content (much of which I had blocked out in the two months between submitting and defending it). When I was done, it was this copy that I printed and brought with me in a ring folder to the viva. I’ve seen a lot of people put post-its and highlights all through their theses, but I just put tabs on each chapter and post-its marking the areas I thought we’d turn back to regularly: 1) key words; 2) details of corpus design; 3) final comments. I’ve heard of people bringing stacks of books to their viva, but if a critical reference isn’t contained within your PhD, you have much bigger problems! The printed, annotated copy of my own thesis was totally adequate.

On the day itself

On the day of your viva, try not to do anything that makes you more anxious than normal. For instance, I’m a coffee addict but I only had one cup that morning, resisting the urge to chain-drink the stuff to get some rocket fuel before the main event. Try not to run around the department like the sky is falling, or to haphazardly skim-read your thesis; you know what’s in there. Go about your business like it’s a normal day and then go to talk about your work with some people.

That’s important enough to bear repeating: you’re just talking with people. During the viva, remember to be as respectful and as grateful as possible, and you will (most likely) be treated with kindness in return. Examiners read hundreds of pages — for free — and often travel great distances just to discuss your work with you. Be gracious about this! Not everyone is entitled to a smooth, friendly viva, but we all hope that we get one.

So when an examiner asks you a question about your thesis (your baby! your precious!), answer as calmly and objectively as possible. They are genuinely curious! Famous people! About your work! Remember that nobody has read and done paperwork and travelled to be horrible to you.


Unlike in sports, the best defense is not a good offense. If the PhD is an apprenticeship, the viva gets close to teaching new scholars what it is like to present to the toughest crowd at a conference, or to get back the most detailed peer review from a journal. In almost all cases, you can accept what your examiner says, or thank them for their comment and think over the ramifications later. This is not to say that you should go limp during the viva; if you feel misunderstood, or if you feel as though a challenge to your theoretical framework/methodology/research design is unfounded and can be easily responded to, do your best to present your perspective. But much of the viva is a group of very clever, very curious people asking questions, hoping for clever, interesting answers. If you are able to get into this mind-set, you might actually be able to do the unthinkable and enjoy your viva. If you manage to impress examiners with both your work and your congenial attitude, your viva might also be the birthplace of new collaborations or lasting scholarly relationships.

You can do it!

This was my last post in the series. If you have any questions about being a PhD student, or if you’re considering doing a PhD at Lancaster University, please get in touch! You can email me at a.potts(Replace this parenthesis with the @ sign) or follow me on Twitter @WatchedPotts