A three-parent baby or a change of battery? Language in the ethical debate on mitochondrial donation

On 22nd October 2014, the House of Commons Science and Technology committee will hold a one-off evidence session on a new human fertilisation technique variously known as mitochondrial donation, mitochondrial transfer or mitochondrial replacement. This technique is intended to help women who carry serious genetic diseases that are passed to the embryo through the mitochondria – the outer layer of the egg (e.g. muscular dystrophy). In such cases, the cell’s mitochondria would be replaced with mitochondria from a healthy donated egg immediately before or after fertilisation, thus eliminating the possibility that the child will inherit the genetic disease.

The first embryo with donated mitochondria was successfully created at Newcastle University in 2010. In 2012, the Nuffield Council on Bioethics approved the procedure. However, the technique has not yet been legally approved in the UK. Two public consultations have found that the majority of people are in favour of introducing the technique, but have also revealed some opposition. Previous parliamentary discussions have primarily focussed on the safety of the procedure. However, concerns have been expressed both in Parliament and in the media about the ethics of manipulating the genetic make-up of human embryos.

As far as the ethical issues are concerned, the language used to describe the procedure is crucial, especially in media reporting. In order to study this language systematically, we constructed a dataset (corpus) including all relevant news reports published in the UK press between April 2010 (when the Newcastle team announced the success of the technique), and September 2014. The corpus contains a total of 119 news articles, amounting to 64,804 words. We have found that, in our data, the words used to express the case for or against approval frame the issue in opposite and irreconcilable ways. This, we suggest, reduces the chances of a reasoned debate, and makes it difficult to see the merits of the case.

The case in favour: changing a faulty battery

In April 2010, Newcastle University issued a press release in which one of the directors of the research, Professor Doug Turnbull, explains the new procedure as follows:

‘Every cell in our body needs energy to function. This energy is provided by mitochondria, often referred to as the cells’ ‘batteries’. Mitochondria are found in every cell, along with the cell nucleus, which contains the genes that determine our individual characteristics. The information required to create these ‘batteries’ – the mitochondrial DNA – is passed down the maternal line, from mother to child.


“What we’ve done is like changing the battery on a laptop. The energy supply now works properly, but none of the information on the hard drive has been changed,” […] “A child born using this method would have correctly functioning mitochondria, but in every other respect would get all their genetic information from their father and mother.”

The ‘battery metaphor’ is one of the main rhetorical strategies used in our data to suggest that the procedure poses no ethical issues, and should thus be approved on medical grounds: most people can relate to how changing the battery in an appliance does not affect its essential characteristics. The noun battery occurs 38 times in the data, including both the singular and plural forms. We used a new software tool to find the top ‘collocates’ of the singular form battery, i.e. the words that are strongly associated with this word in our corpus. This tool displays collocates as a network with the search word in the centre (see figure 1).


Figure 1 – Collocation network for battery

Battery is closely linked with the technical term mitochondria on the one hand, and, on the other hand, with a small set of words that belong to the ‘battery’ metaphorical scenario: pack, faulty, replacing and changing. The extracts below are instances of the pattern displayed in figure 2:

About one in 6,500 children are born with defects in their mitochondria – the “batteries“ that power each cell.

The new techniques would see defects in a cell’s battery pack, the mitochondria, replaced by a healthy version supplied by a woman donor

[Mitochondria] are like batteries in a camera or a laptop – you can change them without changing anything else. The child’s identity will come from its two parents, who determine the nuclear DNA.

In these extracts, the focus is on the way in which serious medical problems can be avoided by means of an intervention at the level of cells.

The case against: three-parent babies

The case against approval focuses on the babies who would be born as a result of the procedure, and particularly on their kinship relationships with the people whose cells would be involved in the creation of the embryo: the woman who carries the genetic disease, the woman who donates the healthy mitochondria, and the man whose sperm is used to fertilise the egg.

The word baby as a singular noun occurs 99 times in the corpus, and the plural form babies occurs 268 times. Figure 2 shows the network of words that centres around the plural form babies in our corpus.


Figure 2 – Collocation network for babies

As figure 2 shows, the collocates of babies include:

  • Words that relate to the debate, and to the issue of official approval: approve, legalise, draft, sanction, permit, backing, comment, ministers.
  • Words that relate to the procedure itself and its outcome: create, created, creation, order, genetically, modified, GM, designer, eugenics, three, three-parent.

The second group in particular reveals the main argument against approval of the procedure, namely that it involves the creation of genetically modified babies with three biological parents. This, it is argued, would pave the way to a future where prospective parents can choose the characteristics of their children, such as eye colour. The following extracts express this position:

Three-parent babies may never know their ‘second’ mother

Government accused of dishonesty over GM babies

Dr David King, of watchdog Human Genetics Alert, said: “This will eventually lead to a designer baby market. [...]”

Done differently, it could lead to the creation of designer babies , made to order by hair colour or eye colour.

More specifically, the corpus contains 40 instances of three-parent baby/babies, 33 instances of designer baby/babies and 12 instances of GM babies. In some articles, these phrases are used to place mitochondrial donation alongside other ethically controversial issues:

Issues ranging from fracking to three-parent babies and genetically modified crops are all difficult […].

The problem with the two alternative linguistic framings

The cases for and against approval or mitochondrial donation are expressed in the press in ways that polarise the issue in an extreme, and arguably unhelpful, fashion. In the case against, the creation of a human baby from the genetic material of three people results in a genetically modified, designer human being, and in an abnormal kinship relationship involving two mothers and three parents. In the case in favour, the use of mitochondria from a donated egg is a mechanical process that has negligible genetic implications and no abnormal kinship implications at all. More generally, the case against focuses on the people involved in the process and their relationships, while the case in favour focuses on what scientists do in a lab in order to prevent serious incurable conditions. As figures 1 and 2 show, the two networks centering on babies and battery do not meet: they have no words in common. For example, the verb form associated with the battery network is replace, whereas for babies it is create.

In this context, it is difficult for non-experts to make sense of the complex scientific issue that underlies the ethical questions, namely the function of mitochondria and their role in the genetic make-up of human beings. Those who adopt the ‘battery metaphor’ tend to point out that mitochondria only provide 0.1% of a human being’s genetic material, none of which influences the characteristics that we associate with identity and uniqueness. Those who adopt the ‘three-parent’ view implicitly suggest that two women are equally involved in the creation of the embryo, presumably because the provision of any amount of genetic material would constitute biological parenting.

The language used in the media to represent both sides over-simplifies and polarises the issue, and therefore makes it difficult to understand the basis of the disagreement. It would be desirable to have a debate that enables the public to appreciate the nature and complexity of the scientific issues, so that they can form a reasoned view of the implications of the introduction of the procedure. To achieve that, both sides have to abandon the current linguistic framings, and find a common linguistic ground from which to argue their respective cases.

A Journey into Transcription, Part 3: Clarity

As audio transcribers we listen to sound.  Of primary importance is the clarity of the sound.



The quality of being clear (‘easy to perceive, understand, or interpret’), in particular:

  • The quality of being coherent and intelligible
  • The quality of being easy to hear; sharpness of sound
  • The quality of purity

Let’s consider these qualities and their relevance to the audio transcriber.

The quality of being coherent and intelligible

All of us, when engaged in discussion and conversation, want our language to be coherent and intelligible.  However, for the transcriber listening to a recording, its clarity in the sense of being coherent and intelligible is something of a paradox; it is simultaneously useful and yet also to be ignored.

Naturally, we know that our brains are programmed to attempt to organise and make sense of language.  In this sense, context can often present the transcriber with an invaluable clue to making out words which may be difficult to hear in a recording.

At the initial drafting stage of transcription what we hear at first can turn out to be quite different when we re-listen, edit and proofread the transcript with the glorious benefit of wider context to assist us.  Here are a few of the more entertaining examples:

you wear glasses becomes yoga classes

it’s among the becomes it’s a manga [comic]

yes she was becomes H G Wells

whisking gently becomes whiskey J&B [discussing a recipe!]

However, since the raison d’être of  this corpus is as a basis for research into the language of learners, part of the skill here is in not being distracted by our knowledge of grammatical rules and the surrounding context.

The audio transcriber’s task is to hear what the learner actually says; this may not always be what they (or we) think or expect might be logical or appropriate (or desirable!).  Indeed, the transcription conventions are designed specifically to minimise the possibility of this happening during the transcription process.  In the context of a Graded Examination in Spoken English (GESE) the students (and, on rare occasion, the examiners) can, and sometimes do, say anything!

Below are a few examples of wrong words and non-words which are to be transcribed, alongside words which may have been intended by the speaker:

Continue reading

Welcome our new CASS postgraduate students!

Last week, we had the pleasure of welcoming four new postgraduate students to the centre. Abi, Jennifer, Róisín, and Gillian have now joined last year’s postgraduates Robbie and Amelia in our ever-livelier corridors. These four represent a great range of interests (both academic and personal), and their research promises to be very exciting indeed. Introducing our new postgrads, in their own words:

Abi Hawtin

hawtinI’m currently in my first year of a 1+3 studentship at CASS.  My research is concerned with the methodological issues surrounding the building of corpora, but I’m also interested in how corpus approaches can be applied to critical discourse analysis, online communication, and the relationship between language and gender.

I grew up in Leamington Spa in the West Midlands, and then moved to Lancaster to study for my undergraduate degree in English Language and Linguistics here at Lancaster University. Before choosing my degree I had never even heard of ‘linguistics’, but came across it when trying to find a course that would combine my interests in language and science. I quickly discovered that linguistics is often defined as ‘the scientific study of language’ and haven’t looked back since! I became interested in corpus linguistics in my third year of undergraduate study, when we were shown how the combination of qualitative and quantitative methods could be used to provide insight into real world language use in many different areas of linguistics.

When I’m not working with words I can usually be found with my nose in a book (probably Harry Potter)!

Jennifer Hughes

hughesI am a Research Student at CASS in the first year of my PhD in Linguistics. My PhD focuses on finding psycholinguistic evidence for collocation using EEG. I became interested in this topic whilst doing my BA in English Language and Linguistics at Lancaster, when I took modules in Psycholinguistics and Corpus Linguistics. I then developed this interest during my MA in Language and Linguistics, also at Lancaster, when I wrote a dissertation on how English collocations are processed by native speakers and learners of English.

During my PhD I am looking forward to gaining a more in-depth knowledge of Corpus Linguistics by, for example, exploring the different methods of extracting collocations from a corpus. I am also excited about learning how to use the EEG machine, conducting experiments, and learning more about Psychology in general.

Aside from my academic interests, I also really like dancing and do a variety of styles including tap, ballet, Irish, jazz, and contemporary.

Róisín Knight

knightI first came to Lancaster as an undergraduate studying English Language and Sociolinguistics. I absolutely loved my degree and enjoyed being introduced to many different areas of Linguistics. Once I had graduated, several lecturers parted with the words, “We’ve not seen the last of you… you’ll be back!”.

I then moved to London and trained at the Institute of Education to be a Secondary School English Teacher. I taught for two very crazy, exhausting but ultimately fun years. If there is one thing teaching taught me, it is the true meaning of the phrase ‘emotional rollercoaster’.

It turned out my lecturers were right; I soon missed being able to devote time to studying and completing my own research. I wanted a way to combine my interests in Linguistics with my teaching skills, and this sparked the idea for my PhD topic: investigating how corpus linguistic methods can aid the assessment of Key Stage 3 students’ creative writing.

I was fortunate enough to be offered 1+3 funding from ESRC, so I quit my teaching job (much to my students’ confusion- “how can you be a doctor without knowing medicine?”) and dragged my boyfriend back ‘up north’ (much to his displeasure- “but I don’t want to end up sounding northern!”). I’m really excited to be a new member of CASS, and I’m looking forward to providing updates on this website soon detailing some of the research I’ve been carrying out.

Gillian Smith

smithI am an MA student in the first year of a 1+3 PhD studentship. My research focus is the application of corpus-based approaches to the study of classroom interactions of children with communicative difficulties, specifically investigating how teaching strategies affect their linguistic and social development.

I grew up in a tiny village in the middle of Yorkshire that was so remote I inevitably became a bookworm and hence knew from an early age that I wished to pursue higher academic study. Having been inspired by an exceptional GSCE English teacher, I decided to pursue the subject further, taking A-level English Language and coming to Lancaster in 2011 to study BA English Language and Literature. In the final two years of my undergraduate degree I dropped literature to pursue my English Language studies and subsequently discovered my two main research interests: the study of communication disorders and corpus linguistics. Study of the linguistic manifestation of communicative disorders fascinated me and I was drawn to the widespread and practical applications that corpus linguistics offers.

As postgraduate study was always on my agenda, being given the opportunity to study my specific research interests in CASS was a dream come true. Through links with the centre I have already been given the chance to study in China for a month, which was an incredible experience and I am looking forward to the continuing prospects being a research student in CASS holds.

Are you a current postgraduate student interested in visiting Lancaster University for a research stay, or a current undergraduate student considering taking up a Masters or PhD featuring an element of corpus linguistics? Get in touch (write to cass(Replace this parenthesis with the @ sign)lancs.ac.uk) to see if there are any opportunities to work with CASS.

Remember also to check back periodically to hear updates on what our postgrads are studying and researching.

Brainstorming the Future of Corpus Tools

Since arriving at the Centre for Corpus Approaches to Social Science (CASS), I’ve been thinking a lot about corpus tools. As I wrote in my blog entry of June 3, I have been working on various software programs to help corpus linguists process and analyse texts, including VariAnt, SarAnt, TagAnt. Since then, I’ve also updated my mono-corpus analysis toolkit, AntConc, as well as updated my desktop and web-based parallel corpus tools, including AntPConc and the interfaces to the ENEJE and EXEMPRAES corpora. I’ve even started working with Paul Baker of Lancaster University on a completely new tool that provides detailed analyses of keywords.

In preparation for my plenary talk on corpus tools, given at the Teaching and Language Corpora (TaLC 11) conference held at Lancaster University, I interviewed many corpus linguists about their uses of corpus tools and their views on the future of corpus tools. I also interviewed people from other fields about their views on tools, including Jim Wild, the Vice President of the Royal Astronomical Society.

From my investigations, it was clear that corpus linguists rely on and very much appreciate the importance of tools in their work. But, it also became clear that corpus linguists can sometimes find it difficult to see beyond the features of their preferred concordancer or word frequency generator and attempt to look at language data in completely new and interesting ways. An analogy I often use (and one I detailed in my plenary talk at TaLC 11) is that of an astronomer. Corpus linguists can sometimes find that their telescopes are not powerful enough or sophisticated enough to delve into the depths of their research space. But, rather than attempting to build new telescopes that would reveal what they hope to see (an analogy to programming) or working with others to build such a telescope (an analogy to working with a software developer), corpus linguists simply turn their telescopes to other areas of the sky where their existing telescopes will continue to suffice.

To raise the awareness of corpus tools in the field and also generate new ideas for corpus tools that might be developed by individual programmers or within team projects, I proposed the first corpus tools brainstorming session at the 2014 American Association of Corpus Linguistics (AACL 2014) conference. Randi Reppen and the other organizers of the conference strongly supported the idea, and it finally became a reality on September 25, 2014, the first day of the conference.

At the session, over 30 people participated, filling the room. After I gave a brief overview of the history of corpus tools development, the participants thought about the ways in which they currently use corpora and the tools needed to do their work. The usual suspects—frequency lists (and frequency list comparisons), keyword-in-context concordances and plots, clusters and n-grams, collocates, and keywords—were all mentioned. In addition, the participants talked about how they are increasingly using statistics tools and also starting programming to find dispersion measures. A summary of the ways people use corpora is given below:

  • find word/phrase patterns (KWIC)
  • find word/phrase positions (plot)
  • find collocates
  • find n-grams/lexical bundles
  • find clusters
  • generate word lists
  • generate keyword lists
  • match patterns in text (via scripting)
  • generate statistics (e.g. using R)
  • measure dispersion of word/phrase patterns
  • compare words/synonyms
  • identify characteristics of texts

Next, the participants formed groups, and began brainstorming ideas for new tools that they would like to see developed. Each group came up with many ideas, and explained these to the session as a whole. The ideas are summarised below:

  • compute distances between subsequent occurrences of search patterns (e.g. words, lemmas, POS)
  • quantify the degree of variability around search patterns
  • generate counts per text (in addition to corpus)
  • extract definitions
  • find patterns of range and frequency
  • work with private data but allow  for powerful handling of annotation (e.g. comparing frequencies of sub-corpora)
  • carry out extensive move analysis over large texts
  • search corpora by semantic class
  • process audio data
  • carry out phonological analysis (e.g. neighbor density)
  • use tools to build a corpus (e.g. finding texts, annotating texts, converting non-ASCII characters to ASCII)
  • create new visualizations of data (e.g. a roman candle of words that ‘explode’ out of a text)
  • identify the encoding of corpus texts
  • compare two corpora along many dimensions
  • identify changes in language over time
  • disambiguate word senses

From the list, it is clear that the field is moving towards more sophisticated analyses of data. People are also thinking of new and interesting ways to analyse corpora. But, perhaps the list also reveals a tendency for corpus linguists to think more in terms of what they can do rather than what they should do, an observation made by Douglas Biber, who also attended the session. As Jim Wild said when I interviewed him in July, “Research should be led by the science not the tool.” In corpus linguistics, clearly we should not be trapped into a particular research topic because of the limitations of the tools available to us. We should always strive to answer the questions that need to be answered. If the current tools cannot help us answer those questions, we may need to work with a software developer or perhaps even start learning to program ourselves so that new tools will emerge to help us tackle these difficult questions.

I am very happy that I was able to organize the corpus tools brainstorming session at AACL 2014, and I would like to thank all the participants for coming and sharing their ideas. I will continue thinking about corpus tools and working to make some of the ideas suggested at the session become a reality.

The complete slides for the AACL 2014 corpus tools brainstorming session can be found here. My personal website is here.

Sweepyface: a linguistic profile

This morning brought news of the suicide of a media-branded ‘troll’[1]. Brenda Leyland, the 63 year-old woman behind the @sweepyface Twitter account, a self-proclaimed “researcher” and “anti-McCann” advocate was found dead at a Marriott hotel on Saturday 4th October in Leicester. She was recently contacted by a reporter at Sky News regarding her Twitter activity which frequently suggested that the disappearance of Madeline ‘Maddie’ McCann in May 2007 was being covered up to the profit of her parents Kate and Gerry McCann.

Part of an online community of “antis” – people who challenge the McCann’s account of Maddie’s disappearance – Leyland frequently posted under the @sweepyface Twitter handle tagging her posts with #mccann. “Antis” distinguish themselves from “pros”, or “pro-mccann” advocates, who believe the McCann’s account of their daughter’s disappearance.

Here, we offer a brief and broad analysis of the content flowing to and from the @sweepyface Twitter account during the entirety of 2014, including the language use and online networks in which @sweepyface operated.

Please note that our analysis does not attempt to validate any claims made by any party with regard to the disappearance of Madeline McCann.

Who is sweepyface?

As presented on analysis of the @sweepyface Twitter account

  • Description: researcher
  • Location: London/Los Angeles

What was sweepyface talking about?

  • Number of tweets sent between Jan-Oct 2014: 2,136
  • Tweets by sweepyface which contained the ‘#mccann’ hashtag: 1,992 (93.26% of all tweets in 2014)
  • Frequency
    • We looked at the most frequent words used by @sweepyface in all the tweets sent during 2014. After cutting out frequent grammatical words (like to, the, is, of, and, etc.) which don’t typically reveal much about content, it was found that the most frequent things talked about were:
    • “K & G” – freq 222
      • K & G was used as shorthand to refer to Kate and Gerry McCann, Madeline McCann’s mother and father. They were one of the most frequent topics of interest
      • ‘Kate’ and ‘Gerry’ also appeared, but less frequently (60 times and 39 times, respectively) and were never referred to using their full names, Kate McCann/Gerry McCann
  • “shills”
    • ‘Shills’ was the most frequent lexical word used(unlike grammatical/functional words, lexical words have clear semantic meaning – they are word classes like nouns and verbs).
    • It was almost unique to sweepyface – it was characteristic of her particular way of framing “pros”
    • Shills was used as a catch all term to talk about:
      • Those who would express “pro-mccann” opinions – “pros” and “shills” appear to be interchangeable
      • those who would opposed the opinions of “antis”
    • used as an in-/out-group identifier
  • “police”
    • mostly used to question police practices as in the following Tweet from sweepyface:
      • “#mccann  Rarely a month goes by when our police force are not highlighted as having flawed investigations, PJ is no worse than any other”
    • Tweeted the police, as in the following examples:





19/03/2014 15:32

@metpoliceuk  This is becoming farcical Why will you not consider McCanns as suspects, plenty of clues

08/08/2014 15:31

@gracey52marl @metpoliceuk  #mccann  Not me, I wd like to see Gerrie Nell, prosecute the Mcanns, he wd tear them to shreds

Who did sweepyface affiliate with and what did they say?

Examined only the top 10 accounts with whom @sweepyface had most interaction with. These accounts were:

Rank Account name # of interactions Group
2 TrulyJudy73 456 pro
3 martin_liz 445 anti
4 siamesey 417 anti
5 RothleyPillow 393 anti
6 AdirenM 323 anti
7 1matthewwright1 314 anti
8 ModNrodder 309 pro
9 B_balou 256 anti
10 basilandmanuel 250 pro

Sweepyface most frequently associated directly with others who were actively engaged in talk about the disappearance of Madeline McCann, whether as a “pro” or as an “anti”. Moreover, contact between these accounts was evident and many more accounts were frequently interacting with sweepyface on the same topic.

 [more to follow]

[1] We argue that ‘troll’ as used by the media is defined too broadly – it captures behaviours from low level insults to rape and death threats – and is thus harmful. We adhere instead to the definition of ‘troll’ given here: http://cass.lancs.ac.uk/?p=621

Workshop on ‘Metaphor in end of life care’ at St Joseph’s Hospice, London

On 26th September 2014, three members of the CASS-affiliated ‘Metaphor in end of life care’ project team were invited to run a workshop at St Joseph’s Hospice in London. The workshop was attended by 27 participants, including clinical staff, non-clinical staff and volunteers.

Veronika Koller (Lancaster University) introduced the project, including its background, rationale, research questions, data and use of corpus methods in combination with qualitative analysis. Zsófia Demjén (The Open University) and Elena Semino (Lancaster University) presented the findings from the project that are particularly relevant to communication between healthcare professionals and patients nearing the end of their lives. These findings include: how patients diagnosed with terminal cancer use Violence and Journey metaphors to talk about their experiences of illness and treatment; and how patients and healthcare professionals use a variety of metaphors to talk about their mutual relationships. The project team pointed out the different ‘framings’ provided by different uses of metaphor, particularly in terms of the empowerment and disempowerment of patients. They provided evidence that no metaphor is inherently good or bad for all patients, but rather suggested that different metaphors work differently for different people, or even for the same person at different times. In the final session, Veronika Koller introduced the ‘Metaphor Menu’ – a collection of metaphors used by cancer sufferers, which the team are planning to pilot as a resource for newly-diagnosed patients.

A lively discussion followed each presentation, with many members of the audience asking questions and contributing their personal and professional experiences. The workshop received very positive evaluations in anonymous feedback questionnaires: 83% of participants rated the session at 4 or 5 on a 5-point scale (where 1 corresponds to ‘Very poor’ and 5 to ‘Excellent’). Comments included: Very interesting research & resonated with my experience. Food for thought!’ and ‘Will help with my area of care, will help me understand and think about what my patients and relatives are actually telling me. Will make me reflect and respond more appropriately’.

Welcoming the new members of the Climate Change team

We are delighted to announce that Dr. Marcus Müller from the University of Heidelberg (Germany) and Dr. Maria Cristina Caimotto from the University of Torino (Italy) have kindly agreed to join CASS Changing Climate project, led by Professor John Urry.

They both will have a lot to contribute to the project. Their experience and language skills will allow us to broaden the project’s scope and also examine the discourses around climate change issues in German and Italian newspapers.

Dr. Marcus Müller is a senior lecturer in German linguistics at the Department of German in the University of Heidelberg, Germany. He is also an associate member of the Heidelberg Centre for Transcultural Studies (HCTS) and a teaching fellow of the Heidelberg Graduate School for Humanities and Social Sciences (HGGS). He has also been a visiting lecturer at the universities of Paderborn and Düsseldorf as well as at the universities of Tashkent, Budapest and Beijing. Dr. Marcus Müller is the founder and spokesman for the German-Chinese graduate network “Sprachkulturen – Fachkulturen” and the “Language and Knowledge” Graduate Platform (http://en.sprache-und-wissen.de/). His research interests include corpus linguistics, discourse analysis, grammatical variation, language and social roles, language and art. You can find more about him at http://www.gs.uni-heidelberg.de/sprache02/mitarbeiter/mueller/index.html

Dr. Maria Cristina Caimotto is research fellow in English Language and Translation at the Department of Culture, Politics and Society of the University of Torino. She is also a member of the Environmental Humanities International Research Group. Her research interests include translation studies, political discourse and environmental discourse. In her work, the contrastive analysis of texts in different languages (translated or comparable) is employed as a tool for critical discourse analysis.

The Scottish referendum – did it unite the Guardian and the Mail?

The Guardian and the Mail are very different newspapers. The Guardian is a left-leaning liberal broadsheet while the Mail is a more popular right-leaning ‘middle-market’ newspaper. Generally, they can be relied on to disagree with one another on a range of social, economic and political issues. However, both newspapers supported the recent “No” campaign during the Scottish Independence referendum, which raises a few interesting questions – how did their discourse around Scottish independence contrast? Did they use similar arguments and language, or did they still manage to retain their individual identities?

To explore these questions, we built corpora of the Mail and Guardian (and their Sunday editions) from 18 June 2014 until 18 September 2014 (the three months leading up to the referendum on Scottish independence) by collecting all articles which contained the term Scottish directly followed by independence, referendum, vote or poll.

We then examined the keywords which emerged when each corpus of articles was compared against the 1 million word BE06 Corpus of general British English. A keyword is simply a word which occurs much more often in a corpus when compared against a larger reference corpus. Corpus tools (we used Antconc) can quickly calculate keywords by conducting statistical tests on all the words in the corpus. We looked at the strongest (in terms of statistical saliency) 100 or so keywords for each corpus, and then compared the two sets of keywords to see which occurred just in the Guardian or just in the Mail, but also which were shared by both. The table below shows the keywords that were found.

Guardian Keywords Keywords in both newspapers Mail Keywords
austerity, Britain, Brown, campaigners, Carrell, country, devolution, EU, festival, Holyrood, ISIS, nation, nationalism, north, oil, political, politicians, politics, polling, polls, powers, Saturday, says, secretary, Severin, voted, votes, voting, weather, YouGov Alex, Alistair, all, August, bank, BBC, better, border, Cameron, campaign, currency, Darling, David, debate, Ed, Edinburgh, election, former, Games, Glasgow, has, independence, independent, July, Labour, leader, London, Miliband, minister, MPs, nationalists, No, party, poll, prime, pro, referendum, Salmond, Scotland, Scots, Scottish, September, SNP, tax, Thursday, Together, Tory, UK, undecided, union, vote, voters, Westminster, will, would, Yes Balmoral, border, cabinet, CBI, chairman, crisis, investors, James, Kingdom, MP, PM, prince, Queen, said, shares, sterling, Tories, Tuesday, twitter, uncertainty, United, warned, week, year,


This table isn’t really an analysis though – we need to explore the keywords in more detail by reading the articles that each keyword appears in and getting a sense for how and why they were used. This is achieved by looking at concordance lines, although we can also expand each line to read the entire article. Here are some of our preliminary findings.

The Mail was much more concerned than the Guardian about how the vote would impact on the Royal Family, with its keywords including Prince, Queen and Balmoral. Much is made of the queen’s ‘neutrality’, her relationship with David Cameron, her ‘soft power’ in influencing the vote, her carefully calculated comments, and characteristically, what she is wearing (“a turquoise outfit and hat” in one article). The Queen is also described as receiving daily updates from Balmoral.

The Mail also refers to the keyword uncertainty a lot more than the Guardian, particularly appearing concerned about how the progress of the campaign is bad for markets, investors, businesses and pension holders who don’t like uncertainty e.g. “uncertainty is the enemy of investment’. The use of the Mail keyword crisis also pins the Scottish vote to the idea of a crisis – the vote could trigger an “EMU-style currency crisis within the UK” but there could also be a “leadership crisis” for both Labour and the Conservatives. Another somewhat worrying Mail keyword is warned, with the Mail reporting various people and businesses (Stagecoach, Paul Krugman, Goldman Sachs, Standard Life, John Major, Mark Carney, Doug Flint) issuing warnings about a range of dire consequences that could occur if Scotland gains independence.

Perhaps surprisingly, twitter is a keyword for the Mail, which is interesting given the editor of the Mail, Paul Dacre’s dismissal of the ‘firestorm’ of tweets around a previous Mail article by Jan Moir which attracted the highest number of complaints to the Press Complaints Commission ever back in 2009. But the Mail now seems to have accepted the importance of Twitter and views tweets as newsworthy. To wit, it reports on Rupert Murdoch’s twitter behaviour, as well as tweets from people who disliked the Better Together advertising campaign #PatronisingBTLady. The Mail is especially disapproving of “tartan trolls” who use Twitter to attack celebrities like JK Rowling who endorse the Yes vote.

How about the Guardian? One keyword it used was nationalism, which at first glance may appear that the Guardian wished to critique the “Yes” voters as nationalists. However, there were cases were writers like Billy Bragg and George Monbiot argued that the label of nationalism was unfairly used to obscure ‘self determination’. One journalist approvingly refers to the lack of ‘braveheart nationalism’ in the campaign, although other journalists do attribute nationalism to some Scottish people, but this is felt to be due to London being out of touch and inward looking. Nationalism either doesn’t exist in the campaign, or when it does, can be excused.

Another Guardian keyword is austerity, with some journalists citing views that the current government’s austerity program being blamed as helping the yes camp. This could be an opportunity for the Guardian to blame the government’s economic policy for breaking up the union, but generally this is not done and instead, it is argued that a Yes vote would not end austerity, but merely impose it from Holyrood rather than Westminster.

Unlike the Mail, the Guardian doesn’t spend as much time reporting the warnings of ‘financial experts’, although the keyword oil was interesting, occurring with reference to North Sea Oil reserves and revenues. In a number of articles, the Guardian foregrounds claims by Sir Ian Wood that Alex Salmond has exaggerated North Sea Oil reserves by up to 60%. In terms of perspectivation, Sir Ian Wood’s position is given precedence over Salmond’s e.g. Wood is described as ‘one of the most influential figures in the Scottish oil industry’ and other people are described as quoting his position too. A woman who claims that the No campaigners have ‘downplayed the amount of oil we have left’ is subtly positioned as greedy: ‘It was “our oil”, she said…’ and thus her argument is weakened somewhat. At the end of the same article, another opinion, given by a local Lib Dem chairman who is described as a ‘marine engineer’ appears to be given more precedence: he says ‘Nobody knows how much oil is there’. The Guardian may not know how much oil there is, but it manages to do a good job of casting enough seeds of doubt to make us think that neither does Alex Salmond.

Finally, both newspapers had Yes as a keyword. How did they represent the yes campaigners? The Guardian made reference to yes voters who are starry-eyed, fierce, enterprising, determined, hardline, vocal and proud. It has very little to say about the no voters, indicating a somewhat subtle sense that the yes voters are a little pushy in their sentiments. The Mail doesn’t mention characteristics of the yes voters much, although it does refer to Alex Salmond as shouty and describes the no campaign as floundering and lacklustre.

So, while both newspapers generally supported Scotland staying within the UK, they each did it by using different strategies and in a way which helped them to maintain their own identities, reflecting the concerns and interests of their readers. From this admittedly preliminary analysis it is difficult to make a confident conclusion but the Guardian did appear to make more of an effort to allow a range of positions to be represented, and was somewhat more subtle in its disapproval of the ‘yes’ campaign. The two newspapers did have different strategies on what they said about each other in respect to the campaigning. The Mail barely mentioned the Guardian, only referring a couple of times to a Guardian poll that put Alistair Darling as scoring a victory over Alex Salmond during a two hour debate. The Guardian was more critical of the Mail, however, using the campaigning to get in a few digs at the Mail. One writer sneeringly referred to ‘the Daily Mail’s insistence that anyone who wants to see a fairer society must be a Stalinist’ And another Guardian columnist expressed surprise that ‘I’m on the same side as the Daily Mail too! Which appears to be taking a short break from convincing us the UK has gone down the tubes to press home a slightly perplexing message of: hey, please don’t break up this wonderful hideous slutty drunken immoral country where women, gays and foreigners don’t know their place!’

Now the vote is over, the two newspapers can get back in their respective bunkers.

Swimming in the deep end of the Spoken BNC2014 media frenzy

As someone who enjoys acting in his spare time, I’m rarely afraid of the chance spend some time in the spotlight. But as I sat one morning a few weeks ago in my bedroom, in nothing but a dressing gown, about to do a live interview on a national Irish radio station, with no kind of media training or experience under my belt, I really did get a case of the nerves. I would spend the entire day appearing on over a dozen radio and TV broadcasts (thankfully with time to get dressed after the first), promoting participation in the Spoken BNC2014 project, and finding out the true meaning of the phrase ‘learning on the job’. My experiences taught me a few things about the relationship between the broadcast media and academic research, which I’ve summarised at the end of this blog.

In late July, CASS and Cambridge University Press announced a new collaboration which aims to compile a new spoken British National Corpus, known as the Spoken BNC2014. This is an ambitious project that requires contributions of recordings from hundreds, if not thousands, of speakers from across the entire United Kingdom. As a research team (which includes Lancaster’s Professor Tony McEnery, Cambridge’s Dr Claire Dembry, as well as Dr Vaclav Brezina, Dr Andrew Hardie, and me), we knew that we had to spread the word far and wide in order to drum up the participation of speakers across the country.

So, at the end of August, we put out a press release which teased some preliminary observations, and invited people to get involved by emailing corpus(Replace this parenthesis with the @ sign)cambridge.org. These findings were based on some basic comparisons between the relative frequencies of the words in the demographic section of the original spoken BNC, and those of the first two million words collected for the Spoken BNC2014 project. We put out lists of the top ten words which had fallen and risen in relative frequency the most drastically between the 1990s data and today’s data.

Words which had declined Words which had risen
fortnight facebook
marvellous internet
fetch website
walkman awesome
poll email
catalogue google
pussy cat smartphone
marmalade iphone
drawers essentially
cheerio treadmill

It seems that these words really captured the imagination of the media powers that be. On the week of the release at the end of August, I was told on the Monday afternoon that the release had been sent out. By late that night, the story had already been picked up by the Daily Mail. Such was my joy, and perhaps naivety, that I sent out a brief and fairly humble blog post celebrating the fact that one person from one newspaper had run an article on our story. What I didn’t realise at the time was that, had I put out a blog post every time we discovered a piece of coverage the next day, I would still be writing them now.

The next morning I was woken by a message from Lancaster Linguistics and English Language department’s resident media celebrity, Dr Claire Hardaker, asking urgently for some information about the Spoken BNC2014 project. She had been contacted by LBC Radio, who had caught wind of the story and assumed sort-of-understandably that, since it was a linguistics story that involved Lancaster University, Claire would be directly involved. She isn’t, sadly, but they had lined up a live interview with her in twenty minutes’ time regardless, and she had kindly agreed to do it anyway with what information I could get to her in time.

After that, I soon realised that perhaps this story would garner more interest than a few newspaper articles. My phone went into melt-down, bleeping with emails from the PR team at the university and phone calls from unknown numbers. There was a 90 minute period where I couldn’t leave my room to get a shower, get dressed, and get on to the campus, simply because I was being lined up for so many interviews throughout the day. As such, I had to do my first there and then, in my dressing gown, while Claire Hardaker kindly waited on stand-by in the university press office in case I couldn’t make it to campus on time for my next.

Once I got there, it was a busy day of interviews right through to 6pm that evening. Over the course of the day, I was interviewed by international radio stations BBC World Service and Talk Radio Europe, UK national stations BBC Radio 4, Sky Radio, and Classic FM, Irish national station Today FM, and Russian national station Voice of Russia UK. I was also interviewed by UK regional BBC news stations London, Merseyside, Coventry & Warwick, Lancashire, and Three Counties. The highlight for me though was the TV interview with the Sky News channel, which I recorded using the Skype app on my little Windows tablet. The interviewer could see me, but I couldn’t see her (or indeed hear her all that well), and I had no idea that she was set up in the studio and that the video would be edited together and released that day. Aside from being shown on the Sky News television channel itself, and their website, the interview appeared on upwards of 40 regional radio websites, including Rock FM, Magic FM, The Bee, North Sound, Yorkshire Coast Radio, Wave 965, and Juice Brighton, as well as other media sites. Claire Dembry also got involved from Cambridge, doing further TV interviews with Sky News and even joining me for a live double interview with BBC Radio London.

So, what did I ‘learn on the job’ through my baptism of fire in the media world? Three main points:

  • Some interviewers thought I was announcing the death of the English language

Though most of the interviews went about as smoothly as I could have expected, with me remembering to plug the email address corpus(Replace this parenthesis with the @ sign)cambridge.org at any given opportunity, some were much harder work. Some interviewers seemed horrified at the thought of ‘losing’ words such as marvellous and cheerio, and wanted me to tell them what they could do to help rescue them. Though it was tempting to say “well if you keep saying them they won’t disappear…”, I instead politely made the point that language, like everything else to do with being human, changes over time, and that this is perfectly okay. Just like fashion. This ‘endangered species’ discourse came about in a few interviews, and it seemed that the interviewers felt I was suggesting that the English language was somehow shrinking or degrading over time.

  • Some interviewers thought I was actively promoting the changes I was reporting

In other cases, the interviewers seemed to imply that I was making recommendations for the words that speakers should avoid or should start saying more, in order to ‘stay up to date’ and not come across ‘old fashioned’. In other words, I was mistaken for a prescriptivist rather than a descriptivist, who was trying to stop people from using the word catalogue, or encouraging everybody to say the word treadmill at least five times a day.

  • Some interviewers asked ‘nice’ questions, and some didn’t

This is a more general observation which I suspected to be the case before I started, and had it confirmed as the interviews went on. It is a simple truth that the interviewers who ‘got’ the project the most were the ones who, for me, asked the best questions. When being interviewed about the list of words which have decreased in frequency I was, in varying forms and among many others, asked the following two types of question:

A: The words which were more popular in the 1990s but not so much now – tell me about ‘pussy cat’ – what’s going on there?

B: The words which were as popular in the 1990s as Facebook is now – I guess words like ‘marvellous’ and ‘catalogue’ are harder to spell and we’re getting lazier these days so we’re just going to say shorter words aren’t we?

For me, and I imagine many others, question A is the ‘nice’ question of this pair. The interviewer draws me to one example which looks interesting – fair enough – but importantly they make no inference themselves about the possible explanation. They set up a blank canvas and allow me to paint it in the way which is most advantageous to my purpose.

Question B, however, is much more problematic for me as the interviewee and sadly occurred as much, if not more, than those like question A. Firstly the interviewer has re-conceptualised the findings and created equivalence between the frequency of the declining words and the words on the rise. Therefore the possibility for conclusions like “marmalade used to be as popular as Facebook” or, worse, “iPhones replace pussy cats in British society” are opened up and thrown into the ether.

Secondly, and much harder to deal with immediately, is the lumping of two completely unrelated words (marvellous and catalogue), the assumption of societal degradation (we’re getting lazier), the pseudo-logical causal relationship between written conventions and spoken interaction (harder to spell), which are based on such assumptions of societal degradation (so we’re just going to say shorter words), and, the icing on the cake, the tag question which invites me to agree that everything the interviewer has just said is perfectly correct (aren’t we?). Yes, this is indeed not a nice question. The strategy I developed is to say that yes, everything you have just said could be the case, and then to go about repackaging their question into something more reasonable for me to say anything about. This was not easy and in some cases I did this better than others!

The recurring theme of my experience was the extent to which the interviewers’ expectations of the Spoken BNC2014 research matched what we are actually trying to do. Most of the time, there was a close match and the questions fit my aims well. In the cases where this didn’t happen, and the questions made all sorts of false assumptions, life was more difficult. I don’t think, however, that anyone was deliberately misconstruing our humble aims, and really I’d rather have given those difficult interviews, where I felt like I was in a fight for mutual understanding, than not to have given them at all for fear of being misunderstood. It seems that this is an inevitable aspect of daring to throw your work out of the bubble of academia and into the public sphere, where it really matters. My goal for next time is to improve the way that the research is communicated in the first place, and to plug potential potholes of misunderstanding in a way that is as accurate as reasonable but still makes a good story.

Overall, I think I managed as well as I could have done, given the abrupt start to the day and my naïve expectation that the press wouldn’t be as interested in the story as it turns out they were. Hopefully we’ll have generated lots of interest in the project. I’d like to thank Claire Hardaker for helping me learn the ropes as I went along, the staff at Lancaster University’s press office for keeping me in the right place at the right time, and the ESRC, who have since offered me some media training, which I will very gladly accept. Awesome!

Corpus linguistics MOOC: Second run beginning soon

We are running the corpus MOOC again – and we are really looking forward to it. In the first run of the course we taught social scientists and other researchers from across the globe about how to use corpus linguistics to study language. We looked at a range of topics of contemporary social relevance in doing so – including how we talk about disability and how newspapers write about refugees. We also looked at key areas where corpus linguistics has contributed greatly, notably the areas of dictionary construction and language teaching.

The result, I must say, exceeded our expectations – which were pretty high. People really seemed to like the course and get a lot from it. Even though the approach was entirely new to most students, a very large number worked through all eight weeks of the course. The feedback on our training has been exceptionally strong – a look at the #corpusMOOC hashtag on Twitter will give a good idea of the overwhelmingly positive response to that course. The following quote, from a Chinese notice board on which our MOOC was discussed, gives a strong sense of how the course succeeded both in training students and in showing them that corpora have a key role to play in exploring social science questions (thanks to Richard Xiao for the translation):

“CorpusMOOC, with its assembly of the best corpus linguists and rich content, cannot be praised enough … The greatest benefit for me has been that the course has widened my vision: corpus linguistics and the applications of corpus technologies have gone far beyond what I had imagined – more resembling big data in the field of social science research instead of being confined to linguistics… I think the significance of this course lies not merely in teaching a large number of corpus techniques but more, rather, in introducing corpora and demonstrating what corpora can be used for, thus making us aware of them and helping us understand their importance … the corpus-based approach is the unavoidable approach to language in future.”

The first run of the MOOC had a great impact – the course was taken mainly by women (70.44% of students), and drew participants from all continents and a wide range of countries – including places as far flung as the British Antarctic Territory! The areas in which course participants were working and researching were heavily oriented to the social sciences, with students drawn from areas such as business consulting and management, health and social care and media and publishing. The greatest contribution of the course, however, seems to have come from providing training to teachers/lecturers in the UK and beyond. Given that the great majority of students were taking the course for career development (78.59%), the course was likely not only to have had a strong effect on this group but also, by extension, on the students who are exposed to the ideas in the course by the teachers/lecturers who took it.

Having read this, you can probably understand why we were keen to run the course again. Through it we have been able to get a good understanding of corpus linguistics across to thousands of people around the globe. We have made a few changes to the course based on the feedback we received – all designed to make a good course better! This includes new lectures (for example on the language used in cancer treatment) and new in conversation pieces with corpus linguists (such as Douglas Biber).

If this run of the course proves as popular as the first, which we think it should, we plan to run the course every September. Who knows when we will stop!

For a limited time, registration is still open. Book your place on ‘Corpus linguistics: method, analysis, interpretation’ now.