Using the law to challenge cultures of hate

Outlawing homophobic and transphobic hate crime in Europe

All crimes hurt in one way or another — emotionally, physically, or economically. Yet an accumulation of research evidence now shows conclusively that as a category of crime, hate crimes hurt more on average compared to otherwise motivated crimes. Hate crime victims are more likely to report experiencing post-victimisation emotional and psychological distress.

The greater harms inflicted by hate crimes provide the justification for hate crime laws. Any objections that such laws restrict freedom of speech fail to acknowledge that the expressive evidence by which we come to recognise hate crime rarely consists of what we might conventionally call ‘speech’. ‘Invective’ is a more accurate word. And it is likely that in a majority of hate crimes the spewing of hateful invective is the sole act by the perpetrator.

In addressing a seminar bringing together LGBTI organisations from across Europe with EU policy makers and other experts on hate crime in Brussels on International Human Rights Day last week, organised by ILGA-Europe, the International Lesbian, Gay, Bisexual, Trans and Intersex  Association of Europe, I asked, given that the harms of homophobic hate crime are well-documented (and by inference the same gravity of harms are likely to be inflicted by transphobic hate crime — although less well documented), why is it that these crimes are not treated equitably in Europe with racist crimes which are subject to the 2008 EU Framework Decision on Racism and Xenophobia?  The Framework Decision includes an obligation for the criminal jurisdictions of Member States to consider racist or xenophobic motivation as an aggravating circumstance, or alternatively for the courts to take such motivation into account, in the determination of penalties for convicted offenders.

How hate hurts

A victim’s personal biography will shape how they react to crime. There will therefore be considerable variation in the reactions of hate crime victims — as there will be with victims of otherwise motivated crime. However, it is clear that as a group, hate crime victims report greater hurts when responding to crime surveys.

There are a number of reasons that might be proffered to explain this phenomenon. Chief among them is that hate crimes are ‘message crimes’. Intentionally or not, perpetrators send a message that their target is disparaged, denigrated, marginalised. The message strikes at the core of the victim’s identity. But the message is not personal. It is not about the particular individual on the receiving-end. It is about what their identity represents in the particular cultures in which hate violence is nested. Such representations give permission for violence.

And violence should not just be thought of physical attack. We might also think of violence as ‘violence of the word’ — to use a phrase coined some years ago to characterize threats, slurs, epithets and other forms of verbal denigration and hateful invective.[1]

We all know the common terms of abuse that constitute homophobic and transphobic invective. I won’t parrot them here. Often, these offences are referred to as ‘low level’ because they don’t inflict physical injury. But the mental wounds can be severe as evidenced by studies of post-victimisation distress and the testimony of victims.

We all know the old-adage ‘sticks and stones may break my bones but words will never hurt me’. What an awful lie. The emotional wounds inflicted by hateful invective can linger long beyond the time it takes for physical injuries to heal.

A disturbing picture of this type of victimisation was revealed recently by one of the most comprehensive surveys to date of the experience of discrimination, harassment and violence against lesbian, gay, bisexual, transgender and queer people, carried out by the European Union Fundamental Rights Agency in 2012.[2] Almost a fifth (19%) of  the 93,079 people aged 18 or over across the 27 EU Member States and Croatia, who responded to an online questionnaire, said that they had been victims of harassment in the past year — partly or completely because they were perceived to be lesbian, gay, bisexual or transgendered. Lesbian women were the most likely to have been harassed—almost a quarter (23%) in the last year—along with transgender respondents, of whom 22% had been harassed in the preceding 12 months.

Sending a message about hate violence

Just as hate crimes are message crimes, the omission of homophobic and transphobic hate crime from the scope of the 2008 EU Framework Decision unfortunately unintentionally sends a message: a message that such crimes are considered to be less serious than racist crimes.

The communicative function of the law cannot be understated. To understand this, in the case of hate violence, we need to understand the cultural context of such violence. It is nested in cultures of bigotry, prejudice, stereotypes, and narratives about difference—real or imagined. Challenging hate violence therefore necessitates challenging the cultural values which spawn hate. In this context the law plays a crucial role in constructing resistant narratives against the attitudes and values, or in other words the cultures, underpinning hate violence.

Law and culture are deeply interwoven. The law is not simply an autonomous product of culture. The law is constitutive of culture itself by providing a narrative of how a society seeks to visualize itself and envisions relations between its members. The law therefore not only sends a message of condemnation of hateful behaviour: it is the message.

In response to homophobic and transphobic violence there is a need to send a message back that such violence is abhorred no less, and no more, than racist violence. The same message needs to be sent about other forms of discriminatory violence. Laws against hate crime are therefore a vital component of the counter-narrative against the cultural values which spawn hate violence.

Using the law in this way to challenge culture does not impose a cultural straightjacket. In the case of hate violence, by seeking to restrain and alter aspects of culture that are destructive of human interaction the law seeks to lay the foundations for the dynamic evolution of our communities by denouncing the values that underpin hate violence and by implication promoting respect for diversity and difference.

While hate crime laws are an essential cultural force, it does not necessarily follow, however, that all offenders should be subject to punitive measures upon conviction. It needs to be acknowledged that the culpability of offenders is shared with the communities where the cultural values which spawn hate violence reside. Many offenders who perhaps lash out in the heat of the moment, or who perhaps have a laugh at the other person’s expense, or perhaps go along to get along with friends, may not be aware of the full depth of hurt they inflict on their victim when they vent commonly-held bigotry. In many cases, rehabilitative interventions, or some other form of therapeutic intervention aimed at helping the offender to begin to address the personal and social contexts for their offending will be more appropriate and just.

To get to that point though criminal justice systems across Europe need to take homophobic and transphobic hate crime seriously so that victims and offenders might receive an appropriate response. In this context, the target of hate crime laws is not only the everyday cultures in which hate violence is nested. Such laws are also targeted at the cultures of criminal justice organisations to promote understanding of what hate crime involves and why it should be taken seriously. The 2008 Framework Decision on Racism and Xenophobia has prompted some EU Member States to extend legal measures against hate crime to include homophobic and transphobic violence. We now need all Member States to establish inclusive hate crime laws to address such violence and to end an indefensible double-standard whereby violence on the basis of a person’s sexual orientation or sexual identity is treated less seriously in some countries compared to others, and less seriously than other forms of discriminatory violence.

Paul Iganski is Professor of Criminology and Criminal Justice in the Lancaster University Law School, UK. His latest book, Hate Crime. A Global Perspective, written together with Jack Levin from Northeastern University’s Brunick Center on Violence and Conflict, in Boston, will be published in May 2015. Paul is on the Management Board of the Lancaster University ESRC Centre for Corpus Approaches to Social Sciences (CASS) and leads a CASS research project on The management of hateful invective by the courts.

[1] Matsuda, M. (1989) ‘Public responses to racist speech: considering the victim’s story’, Michigan Law Review, 87, pages 2320-2381.

[2] European Union Agency for Fundamental Rights (FRA) (2013) European Union Lesbian, Gay, Bisexual and Transgender Survey, Vienna: European Union Agency for Fundamental Rights.

New CASS project: Big data media analysis and the representation of urban violence in Brazil

A new project in CASS has been funded jointly by the UK’s Economic and Social Research Council and the Brazilian research agency CONFAP. The project will involve a collaboration between two Lancaster academics (Professors Elena Semino and Tony McEnery) and two Brazilian academics: Professor Heloísa Pedroso de Moraes Feltes (University of Caxias do Sul) and Professor Ana Cristina Pelosi (University of Santa Cruz do Sul and Federal University of Ceara). The team will employ corpus methods to investigate the linguistic representation of urban violence in Brazil.

Urban violence is a major problem in Brazil: the average citizen is affected by acts of violence, more or less directly, on a daily basis. This creates a general state of fear and insecurity among the population, but, at the same time, may promote a sense of empathy with the less privileged classes in Brazil. Urban violence is also a regular topic in daily conversations and news media, so that people’s perceptions of the nature of this phenomenon are partly mediated by discourse. In particular, daily press reports of acts of violence may affect people’s views and attitudes in ways which may or may not be consistent with the actual incidence, forms and causes of violence.

This collaborative project will investigate the linguistic representation of urban violence in Brazil by applying the methods of Corpus Linguistics to two corpora:

  1. The existing transcripts of two focus groups on living with urban violence conducted in Fortaleza, Brazil, for a total of approximately 20,000 words;
  2. A new 2-million-word corpus of news reports in the Brazilian press, to be constructed as part of the partnership.

The linguistic representation of urban violence in the two corpora will be investigated by means of the analysis of: lexical and semantic concordances, collocational patterns and key words.  A comparison will also be carried out between the two corpora, in order to identify similarities and differences with respect to what types of violence are primarily talked about and how they are linguistically represented.

The comparative analysis of the two corpora will make it possible to explore in detail the relationships between official statistics about urban violence, media representations and citizens’ views. A better understanding of these relationships can help to alleviate the consequences of urban violence on citizens’ lives, and to foster attitudes conducive to the solution of the social problems that cause the violence in the first place.

Twitter host CASS event: Twitter rape threats and the Discourse of Online Misogyny

Twitter’s public policy team will tomorrow host an event organised by the Discourse of Online Misogyny (DOOM) project team at CASS. The team consists of Dr. Claire Hardaker, Lecturer in Corpus Linguistics in the Department of Linguistics and English Language, and Mark McGlashan, Senior Research Associate on the DOOM project.

The event assembles a number of key stakeholders drawn from Twitter’s global public policy team, law enforcement, the prosecution service, NGOs, and academia (law, psychology, computing, linguistics) to discuss public concerns about online abuse. At the event, the DOOM project team will talk about their work on rape threats made using Twitter and how their unique methods enabled analysis of how people made rape threats using language as well as how abusive social networks are formed online.

The DOOM team will stress the need for corpus-based approaches in the research and development of tools and methods for investigating and tackling online abuse.

For more information, contact the Lancaster University Press Office:

Email: pressoffice(Replace this parenthesis with the @ sign)

Call: +44 (0)1524 594120

Tweet: @LancasterPress

[more to follow]

Turning the tables on the stalkers

On 13th November, I presented a talk at a joint Paladin/Collyer-Bristow event. Paladin, the National Stalking Advocacy Service, assists high risk victims of stalking throughout England and Wales. Collyer Bristow’s Cyber Investigation Unit (CIU), which is headed up by partner Rhory Robertson, comprises a dedicated team of lawyers who advise victims of cyberstalking, cyber harassment, cyber bullying and internet trolls/trolling.

The major discussion of this event surrounded the notion of cyberstalking, how it affects victims, and how to combat this increasingly prevalent crime. My talk covered the research we are currently undertaking on the DOOM project, and how networks of abusive individuals can form online, potentially leading to escalation. Other speakers at the event included:

  • Nadine Dorries MP, Conservative MP for Mid Bedfordshire, and a cyberstalking target
  • Betsy De Thierry, Founder Director of the Trauma Recovery Centre, and a cyberstalking target
  • Laura Richards, CEO of Paladin, the National Stalking Advocacy Service
  • Steve Slater, Computer Forensics Manager, Devon & Cornwall Police
  • Alison Morgan, Barrister at 6KBW

Several points clearly emerged from this event, including the fact that there is much work to be done around stalking and harassment, that victims need to be taken much more seriously, that the correct legislation needs to be applied when prosecuting, and the recognition that online stalking can be just as damaging – if not more so in some cases – as physical, offline stalking.

New CASS Briefing now available — The EDL: moving right-wing populism online in the UK

CASSbriefings-EDLThe EDL: moving right-wing populism online in the UK. The English Defence League (EDL) is a far-right populist political movement and campaigns specifically on issues concerning the presence of Muslims and Islam in Western societies. This briefing from CASS presents the results of a corpus study on the online activities of the EDL and its supporters. The briefing shows that, although the hierarchy of the EDL claims to be specifically concerned with radical Islam, the discourse of supporters is less focussed and contains more explicit forms of Islamophobia.

New resources are being added regularly to the new CASS: Briefings tab above, so check back soon.

Participate in our ESRC Festival of Social Sciences “Language Matters” event online

We are very pleased like to announce an event that we are live streaming on YouTube and Google+ next week. We hope you can find time to attend online*; if not, the recording will be available on YouTube afterwards.

From 1730 – 1900 GMT on 4 November, the ESRC Centre for Corpus Approaches to Social Science is hosting a live event in association with the ESRC Festival of Social Sciences and in tangent with our popular FutureLearn course. We would be thrilled if you could ‘tune in’ and collaborate with us during “Language Matters: Communication, Culture, and Society”.

This evening is a mini-series of four informal talks showcasing the impact of language on society. These are presented by some leading names in corpus linguistics (including the CASS Principal Investigator, Tony McEnery) and their talks draw upon the most popular themes in our corpus MOOC:

- What can corpora tell us about learning a foreign language? (with Vaclav Brezina)
- A ‘battle’, a ‘journey’, or none of these? Metaphors for cancer (with Elena Semino)
- Wolves in the wires: online abuse from people to press (with Claire Hardaker)
- Words ‘yesterday and today’ (with Tony McEnery, Claire Dembry, and Robbie Love)

Though we pride ourselves on bringing interesting, accessible material to people on the go, what really brings these events to life is the interactions that we have with attendees. That’s why we invite you to log in and contribute to the discussions taking place after each presentation.

There are two ways to virtually attend.

First, via Google Hangout if you have a Google account. Sign up at and then log in from 17:15 GMT  on 4 November to greet your fellow participants.

If you don’t have a Google account, you can watch us on YouTube at with no registration.

We’ll be taking questions from the Google Hangout and from the #corpusMOOC hashtag on Twitter (particularly for those viewing on YouTube) and mixing these in with questions from our live audience.

We hope that you can take advantage of this event by participating online.

* If you are available, located in the London area, and would like to attend in person, please visit our event website to register.

A three-parent baby or a change of battery? Language in the ethical debate on mitochondrial donation

On 22nd October 2014, the House of Commons Science and Technology committee will hold a one-off evidence session on a new human fertilisation technique variously known as mitochondrial donation, mitochondrial transfer or mitochondrial replacement. This technique is intended to help women who carry serious genetic diseases that are passed to the embryo through the mitochondria – the outer layer of the egg (e.g. muscular dystrophy). In such cases, the cell’s mitochondria would be replaced with mitochondria from a healthy donated egg immediately before or after fertilisation, thus eliminating the possibility that the child will inherit the genetic disease.

The first embryo with donated mitochondria was successfully created at Newcastle University in 2010. In 2012, the Nuffield Council on Bioethics approved the procedure. However, the technique has not yet been legally approved in the UK. Two public consultations have found that the majority of people are in favour of introducing the technique, but have also revealed some opposition. Previous parliamentary discussions have primarily focussed on the safety of the procedure. However, concerns have been expressed both in Parliament and in the media about the ethics of manipulating the genetic make-up of human embryos.

As far as the ethical issues are concerned, the language used to describe the procedure is crucial, especially in media reporting. In order to study this language systematically, we constructed a dataset (corpus) including all relevant news reports published in the UK press between April 2010 (when the Newcastle team announced the success of the technique), and September 2014. The corpus contains a total of 119 news articles, amounting to 64,804 words. We have found that, in our data, the words used to express the case for or against approval frame the issue in opposite and irreconcilable ways. This, we suggest, reduces the chances of a reasoned debate, and makes it difficult to see the merits of the case.

The case in favour: changing a faulty battery

In April 2010, Newcastle University issued a press release in which one of the directors of the research, Professor Doug Turnbull, explains the new procedure as follows:

‘Every cell in our body needs energy to function. This energy is provided by mitochondria, often referred to as the cells’ ‘batteries’. Mitochondria are found in every cell, along with the cell nucleus, which contains the genes that determine our individual characteristics. The information required to create these ‘batteries’ – the mitochondrial DNA – is passed down the maternal line, from mother to child.


“What we’ve done is like changing the battery on a laptop. The energy supply now works properly, but none of the information on the hard drive has been changed,” […] “A child born using this method would have correctly functioning mitochondria, but in every other respect would get all their genetic information from their father and mother.”

The ‘battery metaphor’ is one of the main rhetorical strategies used in our data to suggest that the procedure poses no ethical issues, and should thus be approved on medical grounds: most people can relate to how changing the battery in an appliance does not affect its essential characteristics. The noun battery occurs 38 times in the data, including both the singular and plural forms. We used a new software tool to find the top ‘collocates’ of the singular form battery, i.e. the words that are strongly associated with this word in our corpus. This tool displays collocates as a network with the search word in the centre (see figure 1).


Figure 1 – Collocation network for battery

Battery is closely linked with the technical term mitochondria on the one hand, and, on the other hand, with a small set of words that belong to the ‘battery’ metaphorical scenario: pack, faulty, replacing and changing. The extracts below are instances of the pattern displayed in figure 2:

About one in 6,500 children are born with defects in their mitochondria – the “batteries“ that power each cell.

The new techniques would see defects in a cell’s battery pack, the mitochondria, replaced by a healthy version supplied by a woman donor

[Mitochondria] are like batteries in a camera or a laptop – you can change them without changing anything else. The child’s identity will come from its two parents, who determine the nuclear DNA.

In these extracts, the focus is on the way in which serious medical problems can be avoided by means of an intervention at the level of cells.

The case against: three-parent babies

The case against approval focuses on the babies who would be born as a result of the procedure, and particularly on their kinship relationships with the people whose cells would be involved in the creation of the embryo: the woman who carries the genetic disease, the woman who donates the healthy mitochondria, and the man whose sperm is used to fertilise the egg.

The word baby as a singular noun occurs 99 times in the corpus, and the plural form babies occurs 268 times. Figure 2 shows the network of words that centres around the plural form babies in our corpus.


Figure 2 – Collocation network for babies

As figure 2 shows, the collocates of babies include:

  • Words that relate to the debate, and to the issue of official approval: approve, legalise, draft, sanction, permit, backing, comment, ministers.
  • Words that relate to the procedure itself and its outcome: create, created, creation, order, genetically, modified, GM, designer, eugenics, three, three-parent.

The second group in particular reveals the main argument against approval of the procedure, namely that it involves the creation of genetically modified babies with three biological parents. This, it is argued, would pave the way to a future where prospective parents can choose the characteristics of their children, such as eye colour. The following extracts express this position:

Three-parent babies may never know their ‘second’ mother

Government accused of dishonesty over GM babies

Dr David King, of watchdog Human Genetics Alert, said: “This will eventually lead to a designer baby market. [...]”

Done differently, it could lead to the creation of designer babies , made to order by hair colour or eye colour.

More specifically, the corpus contains 40 instances of three-parent baby/babies, 33 instances of designer baby/babies and 12 instances of GM babies. In some articles, these phrases are used to place mitochondrial donation alongside other ethically controversial issues:

Issues ranging from fracking to three-parent babies and genetically modified crops are all difficult […].

The problem with the two alternative linguistic framings

The cases for and against approval or mitochondrial donation are expressed in the press in ways that polarise the issue in an extreme, and arguably unhelpful, fashion. In the case against, the creation of a human baby from the genetic material of three people results in a genetically modified, designer human being, and in an abnormal kinship relationship involving two mothers and three parents. In the case in favour, the use of mitochondria from a donated egg is a mechanical process that has negligible genetic implications and no abnormal kinship implications at all. More generally, the case against focuses on the people involved in the process and their relationships, while the case in favour focuses on what scientists do in a lab in order to prevent serious incurable conditions. As figures 1 and 2 show, the two networks centering on babies and battery do not meet: they have no words in common. For example, the verb form associated with the battery network is replace, whereas for babies it is create.

In this context, it is difficult for non-experts to make sense of the complex scientific issue that underlies the ethical questions, namely the function of mitochondria and their role in the genetic make-up of human beings. Those who adopt the ‘battery metaphor’ tend to point out that mitochondria only provide 0.1% of a human being’s genetic material, none of which influences the characteristics that we associate with identity and uniqueness. Those who adopt the ‘three-parent’ view implicitly suggest that two women are equally involved in the creation of the embryo, presumably because the provision of any amount of genetic material would constitute biological parenting.

The language used in the media to represent both sides over-simplifies and polarises the issue, and therefore makes it difficult to understand the basis of the disagreement. It would be desirable to have a debate that enables the public to appreciate the nature and complexity of the scientific issues, so that they can form a reasoned view of the implications of the introduction of the procedure. To achieve that, both sides have to abandon the current linguistic framings, and find a common linguistic ground from which to argue their respective cases.

A Journey into Transcription, Part 3: Clarity

As audio transcribers we listen to sound.  Of primary importance is the clarity of the sound.



The quality of being clear (‘easy to perceive, understand, or interpret’), in particular:

  • The quality of being coherent and intelligible
  • The quality of being easy to hear; sharpness of sound
  • The quality of purity

Let’s consider these qualities and their relevance to the audio transcriber.

The quality of being coherent and intelligible

All of us, when engaged in discussion and conversation, want our language to be coherent and intelligible.  However, for the transcriber listening to a recording, its clarity in the sense of being coherent and intelligible is something of a paradox; it is simultaneously useful and yet also to be ignored.

Naturally, we know that our brains are programmed to attempt to organise and make sense of language.  In this sense, context can often present the transcriber with an invaluable clue to making out words which may be difficult to hear in a recording.

At the initial drafting stage of transcription what we hear at first can turn out to be quite different when we re-listen, edit and proofread the transcript with the glorious benefit of wider context to assist us.  Here are a few of the more entertaining examples:

you wear glasses becomes yoga classes

it’s among the becomes it’s a manga [comic]

yes she was becomes H G Wells

whisking gently becomes whiskey J&B [discussing a recipe!]

However, since the raison d’être of  this corpus is as a basis for research into the language of learners, part of the skill here is in not being distracted by our knowledge of grammatical rules and the surrounding context.

The audio transcriber’s task is to hear what the learner actually says; this may not always be what they (or we) think or expect might be logical or appropriate (or desirable!).  Indeed, the transcription conventions are designed specifically to minimise the possibility of this happening during the transcription process.  In the context of a Graded Examination in Spoken English (GESE) the students (and, on rare occasion, the examiners) can, and sometimes do, say anything!

Below are a few examples of wrong words and non-words which are to be transcribed, alongside words which may have been intended by the speaker:

Continue reading

Welcome our new CASS postgraduate students!

Last week, we had the pleasure of welcoming four new postgraduate students to the centre. Abi, Jennifer, Róisín, and Gillian have now joined last year’s postgraduates Robbie and Amelia in our ever-livelier corridors. These four represent a great range of interests (both academic and personal), and their research promises to be very exciting indeed. Introducing our new postgrads, in their own words:

Abi Hawtin

hawtinI’m currently in my first year of a 1+3 studentship at CASS.  My research is concerned with the methodological issues surrounding the building of corpora, but I’m also interested in how corpus approaches can be applied to critical discourse analysis, online communication, and the relationship between language and gender.

I grew up in Leamington Spa in the West Midlands, and then moved to Lancaster to study for my undergraduate degree in English Language and Linguistics here at Lancaster University. Before choosing my degree I had never even heard of ‘linguistics’, but came across it when trying to find a course that would combine my interests in language and science. I quickly discovered that linguistics is often defined as ‘the scientific study of language’ and haven’t looked back since! I became interested in corpus linguistics in my third year of undergraduate study, when we were shown how the combination of qualitative and quantitative methods could be used to provide insight into real world language use in many different areas of linguistics.

When I’m not working with words I can usually be found with my nose in a book (probably Harry Potter)!

Jennifer Hughes

hughesI am a Research Student at CASS in the first year of my PhD in Linguistics. My PhD focuses on finding psycholinguistic evidence for collocation using EEG. I became interested in this topic whilst doing my BA in English Language and Linguistics at Lancaster, when I took modules in Psycholinguistics and Corpus Linguistics. I then developed this interest during my MA in Language and Linguistics, also at Lancaster, when I wrote a dissertation on how English collocations are processed by native speakers and learners of English.

During my PhD I am looking forward to gaining a more in-depth knowledge of Corpus Linguistics by, for example, exploring the different methods of extracting collocations from a corpus. I am also excited about learning how to use the EEG machine, conducting experiments, and learning more about Psychology in general.

Aside from my academic interests, I also really like dancing and do a variety of styles including tap, ballet, Irish, jazz, and contemporary.

Róisín Knight

knightI first came to Lancaster as an undergraduate studying English Language and Sociolinguistics. I absolutely loved my degree and enjoyed being introduced to many different areas of Linguistics. Once I had graduated, several lecturers parted with the words, “We’ve not seen the last of you… you’ll be back!”.

I then moved to London and trained at the Institute of Education to be a Secondary School English Teacher. I taught for two very crazy, exhausting but ultimately fun years. If there is one thing teaching taught me, it is the true meaning of the phrase ‘emotional rollercoaster’.

It turned out my lecturers were right; I soon missed being able to devote time to studying and completing my own research. I wanted a way to combine my interests in Linguistics with my teaching skills, and this sparked the idea for my PhD topic: investigating how corpus linguistic methods can aid the assessment of Key Stage 3 students’ creative writing.

I was fortunate enough to be offered 1+3 funding from ESRC, so I quit my teaching job (much to my students’ confusion- “how can you be a doctor without knowing medicine?”) and dragged my boyfriend back ‘up north’ (much to his displeasure- “but I don’t want to end up sounding northern!”). I’m really excited to be a new member of CASS, and I’m looking forward to providing updates on this website soon detailing some of the research I’ve been carrying out.

Gillian Smith

smithI am an MA student in the first year of a 1+3 PhD studentship. My research focus is the application of corpus-based approaches to the study of classroom interactions of children with communicative difficulties, specifically investigating how teaching strategies affect their linguistic and social development.

I grew up in a tiny village in the middle of Yorkshire that was so remote I inevitably became a bookworm and hence knew from an early age that I wished to pursue higher academic study. Having been inspired by an exceptional GSCE English teacher, I decided to pursue the subject further, taking A-level English Language and coming to Lancaster in 2011 to study BA English Language and Literature. In the final two years of my undergraduate degree I dropped literature to pursue my English Language studies and subsequently discovered my two main research interests: the study of communication disorders and corpus linguistics. Study of the linguistic manifestation of communicative disorders fascinated me and I was drawn to the widespread and practical applications that corpus linguistics offers.

As postgraduate study was always on my agenda, being given the opportunity to study my specific research interests in CASS was a dream come true. Through links with the centre I have already been given the chance to study in China for a month, which was an incredible experience and I am looking forward to the continuing prospects being a research student in CASS holds.

Are you a current postgraduate student interested in visiting Lancaster University for a research stay, or a current undergraduate student considering taking up a Masters or PhD featuring an element of corpus linguistics? Get in touch (write to cass(Replace this parenthesis with the @ sign) to see if there are any opportunities to work with CASS.

Remember also to check back periodically to hear updates on what our postgrads are studying and researching.

Brainstorming the Future of Corpus Tools

Since arriving at the Centre for Corpus Approaches to Social Science (CASS), I’ve been thinking a lot about corpus tools. As I wrote in my blog entry of June 3, I have been working on various software programs to help corpus linguists process and analyse texts, including VariAnt, SarAnt, TagAnt. Since then, I’ve also updated my mono-corpus analysis toolkit, AntConc, as well as updated my desktop and web-based parallel corpus tools, including AntPConc and the interfaces to the ENEJE and EXEMPRAES corpora. I’ve even started working with Paul Baker of Lancaster University on a completely new tool that provides detailed analyses of keywords.

In preparation for my plenary talk on corpus tools, given at the Teaching and Language Corpora (TaLC 11) conference held at Lancaster University, I interviewed many corpus linguists about their uses of corpus tools and their views on the future of corpus tools. I also interviewed people from other fields about their views on tools, including Jim Wild, the Vice President of the Royal Astronomical Society.

From my investigations, it was clear that corpus linguists rely on and very much appreciate the importance of tools in their work. But, it also became clear that corpus linguists can sometimes find it difficult to see beyond the features of their preferred concordancer or word frequency generator and attempt to look at language data in completely new and interesting ways. An analogy I often use (and one I detailed in my plenary talk at TaLC 11) is that of an astronomer. Corpus linguists can sometimes find that their telescopes are not powerful enough or sophisticated enough to delve into the depths of their research space. But, rather than attempting to build new telescopes that would reveal what they hope to see (an analogy to programming) or working with others to build such a telescope (an analogy to working with a software developer), corpus linguists simply turn their telescopes to other areas of the sky where their existing telescopes will continue to suffice.

To raise the awareness of corpus tools in the field and also generate new ideas for corpus tools that might be developed by individual programmers or within team projects, I proposed the first corpus tools brainstorming session at the 2014 American Association of Corpus Linguistics (AACL 2014) conference. Randi Reppen and the other organizers of the conference strongly supported the idea, and it finally became a reality on September 25, 2014, the first day of the conference.

At the session, over 30 people participated, filling the room. After I gave a brief overview of the history of corpus tools development, the participants thought about the ways in which they currently use corpora and the tools needed to do their work. The usual suspects—frequency lists (and frequency list comparisons), keyword-in-context concordances and plots, clusters and n-grams, collocates, and keywords—were all mentioned. In addition, the participants talked about how they are increasingly using statistics tools and also starting programming to find dispersion measures. A summary of the ways people use corpora is given below:

  • find word/phrase patterns (KWIC)
  • find word/phrase positions (plot)
  • find collocates
  • find n-grams/lexical bundles
  • find clusters
  • generate word lists
  • generate keyword lists
  • match patterns in text (via scripting)
  • generate statistics (e.g. using R)
  • measure dispersion of word/phrase patterns
  • compare words/synonyms
  • identify characteristics of texts

Next, the participants formed groups, and began brainstorming ideas for new tools that they would like to see developed. Each group came up with many ideas, and explained these to the session as a whole. The ideas are summarised below:

  • compute distances between subsequent occurrences of search patterns (e.g. words, lemmas, POS)
  • quantify the degree of variability around search patterns
  • generate counts per text (in addition to corpus)
  • extract definitions
  • find patterns of range and frequency
  • work with private data but allow  for powerful handling of annotation (e.g. comparing frequencies of sub-corpora)
  • carry out extensive move analysis over large texts
  • search corpora by semantic class
  • process audio data
  • carry out phonological analysis (e.g. neighbor density)
  • use tools to build a corpus (e.g. finding texts, annotating texts, converting non-ASCII characters to ASCII)
  • create new visualizations of data (e.g. a roman candle of words that ‘explode’ out of a text)
  • identify the encoding of corpus texts
  • compare two corpora along many dimensions
  • identify changes in language over time
  • disambiguate word senses

From the list, it is clear that the field is moving towards more sophisticated analyses of data. People are also thinking of new and interesting ways to analyse corpora. But, perhaps the list also reveals a tendency for corpus linguists to think more in terms of what they can do rather than what they should do, an observation made by Douglas Biber, who also attended the session. As Jim Wild said when I interviewed him in July, “Research should be led by the science not the tool.” In corpus linguistics, clearly we should not be trapped into a particular research topic because of the limitations of the tools available to us. We should always strive to answer the questions that need to be answered. If the current tools cannot help us answer those questions, we may need to work with a software developer or perhaps even start learning to program ourselves so that new tools will emerge to help us tackle these difficult questions.

I am very happy that I was able to organize the corpus tools brainstorming session at AACL 2014, and I would like to thank all the participants for coming and sharing their ideas. I will continue thinking about corpus tools and working to make some of the ideas suggested at the session become a reality.

The complete slides for the AACL 2014 corpus tools brainstorming session can be found here. My personal website is here.