New CASS Briefing now available — How to communicate successfully in English?

CASSbriefings-EDLHow to communicate successfully in English? An exploration of the Trinity Lancaster Corpus. Many speakers use English as their non-native language (L2) to communicate in a variety of situations: at school, at work or in other everyday situations. As well as needing to master the grammar and vocabulary of the English language, L2 users of English need to know how to react appropriately in different communicative situations. In linguistics, this aspect of language is studied under the label of “pragmatics”. This briefing offers an exploration of the pragmatic features of L2 speech in the Trinity Lancaster Corpus of spoken L2 production.

New resources are being added regularly to the new CASS: Briefings tab above, so check back soon.

The spectre of Nazism haunts social media

Each time there is an upsurge in the Israel-Palestine conflict there is a rise in violent and other abusive incidents against Jews around the world. This phenomenon is now well-known. So it was in 2014 with Israel’s military operation ‘Protective Edge’ in July and August. Numerous backlash incidents against Jews in the UK and elsewhere in the world were reported by news media.

The conflict between Israelis and Palestinians has become a global phenomenon spreading from Gaza and the Occupied Territories of the West Bank into some of Europe’s major cities and other cities around the world. Jews are seemingly targeted as representatives for the State of Israel and attacked as proxies for the Israel Defence Force. It is a crude form of political violence.

In the UK we have the most robust data collected internationally on the problem of anti-Jewish incidents. Last year, such incidents reportedly more than doubled compared to 2013, according to a report published by the Community Security Trust.[1]

What was noticeable this last time around in the Israel-Gaza conflict of July and August 2014 was an apparent upsurge of abuse against Jews on social media. By the end of July 2014, some of the press were reporting an “explosion” of such abuse.

John Mann MP, the chair of the All-Party Parliamentary Group Against Antisemitism, instigated a parliamentary inquiry into the lessons that could be learned from the upsurge of anti-Jewish incidents associated with last year’s conflict. The report of that inquiry was published this week. It includes some of the key findings concerning anti-Jewish abuse on social media produced by a rapid response analysis commissioned from a team at Lancaster University — Paul Iganski and Abe Sweiry from the Lancaster University Law School, along with Mark McGlashan — as part of their work with the Lancaster University ESRC Centre for Corpus Approaches to Social Science.

We downloaded a sample of 22 million Tweets from July and August 2014 and carried out a detailed analysis of a sub-sample of 38,460 Tweets containing the words “Israel” or “Gaza”, along with the words “Jew”, “Jews” or “Jewish”.

The results were very telling:

  • A keyword analysis – one of the core methods of corpus linguistics – showed that in the sub-sample analysed, the spectre of Nazism, with words such as “Hitler”, “Holocaust”, “Nazi” and “Nazis”, was present in the top 35 keywords for the downloaded sample. “Hitler” was mentioned 1117 times; “Holocaust” was mentioned in 505 tweets, and; “Nazi” or “Nazis” were mentioned in 851 tweets.
  • The Nazi theme was also evident in hashtags analysed for the sub-sample, with the high frequency of the hashtags #hitler, # hitlerwasright, and #genocide.

While providing a very useful indication of patterns of discourse, keyword analysis and hashtag analysis alone is never sufficient: the contexts of the tweets in which the keywords and hashtags are situated need to be interpreted. Using the linguistic technique of collocation analysis, tweets that seemed to express negative sentiment targeted explicitly at ‘Jews’ were isolated and subjected to a closer reading. Sadly, there was little interpretation that needed to be applied to our sample. The sentiments conveyed were stark:

  • Some contained explicit anti-Jewish invective which if shouted out on the streets – as does happen in many incidents – would clearly be racially or religiously aggravated public order offences.
  • Others wished violence upon Jews as proxies for Israelis, or simply just as Jews.
  • A number expressed the type of sentiment that “Hitler should have finished the job”. Some of these invoked Hitler to return for the task.
  • In other tweets, the use of gas chambers for Jews was invoked.
  • Others simply included Nazi-slogans.

Deep wounds are scratched when the Nazi-card is played in this way in discourse against Jews. Playing the Nazi-card is not simply abusive. It invokes painful collective memories for Jews and for many others. By using those memories against Jews it inflicts profound hurts. Those who play the Nazi-card know exactly what it means.

Reaction to the military practices of the Israeli state can be expressed in a variety of forceful and trenchant ways – none of which would be antisemitic. The hurts inflicted against Jews when the Nazi card is played cannot be written-off as collateral damage in the protest against Israel, just as the deaths and injuries of innocent Palestinian civilians cannot be written-off as the inevitable casualties of war. As Professor David Feldman, Director of the Pears Institute for the Study of Antisemitism, stated in his written evidence to the All-Party Parliamentary Inquiry Against Antisemitism, playing the Nazi-card with a statement such as ‘Hitler was right’, “invokes both a set of antisemitic stereotypes and a genocidal project targeted at Jews”.[2]

In the UK a sufficient statutory framework is arguably in place to prosecute against the types of anti-Jewish abuse we identified by proceedings under the Malicious Communications 1988 or the Communications Act 2003.[3] In such proceedings courts can treat the anti-Jewish abuse as racial or religious aggravation according to the Criminal Justice Act 2003. The inquiry’s recommendation therefore that the Crown Prosecution Service should give consideration “to the suitability of existing guidance on communications sent via social media” and  ”that hate crime guidance material on grossly offensive speech be reviewed to clarify what amounts to ‘criminal acts’ that ‘will be prosecuted’”[4] is opportune.

Trinity Lancaster Corpus at the International ESOL Examiner Training Conference 2015

On Friday 30th January 2015, I gave a talk at the International ESOL Examiner Training Conference 2015 in Stafford. Every year, the Trinity College London, CASS’s research partner, organises a large conference for all their examiners which consists of plenary lectures and individual training sessions. This year, I was invited to speak in front of an audience of over 300 examiners about the latest development in the learner corpus project.  For me, this was a great opportunity not only to share some of the exciting results from the early research based on this unique resource, but also to meet the Trinity examiners; many of them have been involved in collecting the data for the corpus. This talk was therefore also an opportunity to thank everyone for their hard work and wonderful support.

It was very reassuring to see the high level of interest in the corpus project among the examiners who have a deep insight into examination process from their everyday professional experience.  The corpus as a body of transcripts from the Trinity spoken tests in some way reflects this rich experience offering an overall holistic picture of the exam and, ultimately, L2 speech in a variety of communicative contexts.

Currently, the Trinity Lancaster Corpus consists of over 2.5 million running words sampling the speech of over 1,200 L2 speakers from eight different L1 and cultural backgrounds. The size itself makes the Trinity Lancaster Corpus the largest corpus of its kind. However, it is not only the size that the corpus has to offer. In cooperation with Trinity (and with great help from the Trinity examiners) we were able to collect detailed background information about each speaker in our 2014 dataset. In addition, the corpus covers a range of proficiency levels (B1– C2 levels of the Common European Framework), which allows us to research spoken language development in a way that has not been previously possible.  The Trinity Lancaster Corpus, which is still being developed with an average growth of 40,000 words a week, is an ambitious project:  Using this robust dataset, we can now start exploring crucial aspects of L2 speech and communicative competence and thus help language learners, teachers and material developers to make the process of L2 learning more efficient and also (hopefully) more enjoyable. Needless to say, without Trinity as a strong research partner and the support from the Trinity examiners this project wouldn’t be possible.

New open-access CASS publication on discourses of maritime security

Dr Basil Germond’s latest article discusses the geopolitical dimension of maritime security, which has been neglected by scholars so far. The article analyses three practical examples of maritime security geo-strategies (texts) all released in 2014; one by the UK and two by the EU. The results demonstrate that states’ and international institutions’ maritime security objectives and interests are indirectly and directly influenced by geographical and geopolitical considerations, although this link is only tacitly acknowledged in official documents (narrative). Scholars and practitioners interested in maritime security are encouraged to further engage with this dimension at the practical and discursive level.

Basil Germond “The Geopolitical Dimension of Maritime Security”, Marine Policy 54 (April 2015), pp.137-142.

Marine Policy is an interdisciplinary journal in social science devoted to ocean policy studies. It has a 5-year impact factor of 2.948. 

Latest research on executive compensation by CASS co-investigator featured in Financial Times

Debate surrounding executive compensation is an enduring feature of the U.K. corporate landscape. Although concern over compensation levels continue to grab the attention of politicians and headline writers, concern is also growing over the extent to which performance measures that are widely used in executive compensation contracts (e.g., earnings per share growth and total shareholder return) represent appropriate measures of long-term corporate value creation. This debate partly reflects fears that U.K. executives face excessive pressure to deliver short-term results at the expense of long-term improvements in value.

The Chartered Financial Analysts (CFA) Society of the UK commissioned researchers at Lancaster to undertake a pilot study of executive compensation arrangements and their association with corporate value creation using a subsample of FTSE-100 companies over the period 2003 through 2013. While the results provide a degree of comfort they also create cause for concern. On the positive side, we document evidence of a material positive link between CEO pay and several measures of value creation. The evidence suggests that prevailing executive pay structures incentivize and reward important aspects of value creation even though contractual performance metrics are not directly linked with value creation in many cases.

More troubling, however, is the evidence that: a large fraction of CEO pay appears unrelated to periodic value creation; key aspects of compensation consistently correlate with performance metrics whose link with value creation is indirect at best; and in many cases the metrics used to incentivize and reward senior executives are not directly aligned with the key performance indicators (KPIs) that firms highlight as fundamental drivers of business value..

Although the structure and transparency of executive compensation practices has come a long way since the “fat cat” headlines of the 1990’s, the journey appears far from complete.

Read more details about this research as featured in a recent article in the Financial Times.

A Journey into Transcription, Part 4: The Question Question

question: (NOUN) A sentence worded or expressed so as to elicit information.

Since we speak in utterances (not sentences), most forms of punctuation are omitted in this corpus of learner language; the exceptions being apostrophes, hyphens and question marks. 

This blog concerns question marks.  (Warning: there are not many jokes!)

When we started transcription, the convention seemed simple and straightforward: Question mark indicates a questionThis is easy to apply when questions are straightforward.  For example, the following question types are easy to identify:  

  • yes/no questions (do you like chocolate?);
  • wh- questions (where have you been?);
  • tag questions(rock music is popular isn’t it?);
  • either/or questions (did you catch the train or did you fly?)

However, very soon, we found ourselves in debate about whether and where to transcribe question marks in less straightforward utterances.  This enabled us to amend the convention and add illustrative examples.  In addition, transcribers created a Questions Bank and began to keep a log of decisions made regarding the transcription of question marks; this was done with the aim of achieving the consistency which we anticipate might be vital to researchers in the future. 

So here follows a reflection on some of the varied ways in which speakers can elicit a response in spoken discourse, along with remarks on whether or not a question mark is transcribed in context of this corpus.

It is useful to keep two vital rules in mind:

  • For the learner language corpus it is the structure of the utterance that is crucial rather than the expression or tone of voice. 
  • If in doubt, leave it out!

Either/Or Adjusted Question

Speaker adjusts wording and question structure remains.

  • so in Indian houses do you also have landline telephones or do they  are they disappearing?

Either/Or Anticipation Question:

Use of ‘or’ suggests a choice of alternatives is going to be presented but the questioner’s voice and pace tails off in anticipation of the listener’s response.

  • do you go to a special school? or… [no ellipsis would not be transcribed in corpus]

Doubled Up Question

Structurally, there may be two questions but only one question is actually being asked; question mark transcribed at the end.

  • is it important to do school trips do you think?

Rephrased / Clarified Question:

Multiple rephrased/related questions in quick succession; each is structurally complete, eliciting a single response.

  • in what area? in what field? do have you any idea?
  • what are you going to do when you finish at this school? what will you do next?

Wondering Question:

A question word (often ‘what’) within the utterance and transcribed with question mark.

  • it seems to me your class sizes you have what? forty five students in a class it seems to me they are very large

Question Word/Context Question:

Question word followed by context/detail; often for emphasis and expressing shock or surprise.

  • what? they have a party all day
  • when? in the middle of the night

Clarification/Qualification Question:

A question followed by qualifying phrase for emphasis or for clarification; question mark may be transcribed at the end…

  • what about education more broadly more generally?
  • would you make it more fashionable more stylish?

…or in the middle of the utterance.

  • what do you think the biggest problems are in Mumbai? the biggest pollution problems
  • is that your ambition? to design a bicycle

Interrupted (Clause) Question:

A clause inserted mid-question but structure remains and one main question is being asked.

  • what about looking at education not just at your school looking at education in general?

Implied Question:

Interrogative intonation communicates speaker’s aim to elicit information; however, in this corpus we focus solely on structure so no question mark is transcribed.

Useful test: is the utterance meaningful without interrogative intonation?  If so, no question mark is added.

S:            I thought I was late

E:            really

S:            yes I overslept


E:            and how are you today?

S:            I’m fine and you

E:            I’m fine too


E:            any questions for me about your topic

S:            yes have you ever been to New York?

Statement Question:

Again, interrogative intonation communicates speaker’s aim to elicit information but structurally there is no question in the second part of this utterance and so no question mark is transcribed.

E:            so what do you think is the answer then? you think that parents should be at home more

S:            no I think they should have the choice

Unclear Question:

Key words are unclear making question structure incomplete; no question mark is transcribed.

S:            <unclear=can you> repeat the question please

A Complex Utterance with a Question Structure:

A number of self-corrections but the structure of a question exists.

S:            and do you think it’s it’s good to be in to be in touch with many people and to and to and to con= er contact with your friends and erm and at your home for exa= on your home for example?

Interrupted Question:

If the question is interrupted no question mark is transcribed, however, sometimes a short question structure remains.

S:            is he er good enough?  to

E:            mm

S:            you know develop India and make it a superpower

Interrupted Either/Or Question:

What would originally have been a single either/or question is interrupted resulting in two independent question structures which are each transcribed with question marks.

E:            do you think it’s a skill?

S:            erm I think

E:            or can you get better at it?

So this has been a glimpse at some of the many varied ways speakers use language to elicit a response.  Time and again we chant our mantra: “If in doubt, leave it out“! 

The full version of our Questions Bank is now pretty exhaustive.  Generally we find that utterances can be mapped onto existing example structures so we can be confident that the decision as to if/where to transcribe the question mark will be consistent with previous decisions. So the Questions Bank, for us, has definitely been a valuable transcription tool. 

CASS MA students to present at the Sheffield University Postgraduate Conference in Linguistics

A large part of an academic job is that researchers give formal talks about their work. This is something that all research students are aware of — we have been to countless lectures, heard visiting academics and experienced the talks organised by different research groups. So what happens when the tables are turned and students reach a point where they need to start formally presenting their own work?

In a word: panic. How do you write an abstract? How do you know if your work is good enough? How do you build up the confidence to give a formal talk in front of others? How do you know that, at the end of it all, your work won’t be ripped to pieces?

Abi Hawtin, Gillian Smith and I (all research students currently completing an MA in CASS) have recently been negotiating this minefield of questions. We decided to apply to present our papers at the two-day Sheffield University Postgraduate Conference in Linguistics 2015. This conference is a great ‘first-step’; it’s organised by postgraduate students for other postgraduate and doctoral students and, proud of the friendly and inclusive atmosphere at past events, they explicitly encourage first-time speakers. Opportunities like this also enable students from a range of universities to come together and discuss their experiences and interests. This results in a great variety of topics; for example, 2014 saw presentations on pragmatic language change, the Chinese V-V construction and challenges in making research transformative.

We’ve recently heard that all three of us have been accepted to present at the conference, so wish us luck. Below you can find out the topics of our research.

Construction of male and female identities by a misogynistic murderer: a corpus-based discourse analysis of Elliot Rodger’s manifesto by Abi Hawtin

On 23rd May 2014, Elliot Rodger killed 6 people and injured 13 others in California. He left behind an extreme and violently misogynistic ‘manifesto’ which outlined his views on women, and his plan to take revenge. I use corpus methods (collocation analysis) to analyse the ways in which Rodger constructs the identities of males and females in his manifesto in order to see if the way he views men and women represents a new, and more dangerous, type of misogyny than has previously been studied in detail.

Corpus methods have been used to analyse the representations of gender in language, with many studies finding that men are often represented in positions of power over women (Caldas-Coulthard and Moon, 2010; Pearce, 2008). However, there has been little corpus-based research into explicitly misogynistic texts. This corpus based study of Rodger’s manifesto addresses that gap in the research to date.

By conducting a collocation analysis for both words and semantic tags I found that the dominant way in which Rodger represented females was as extremely powerful and men as oppressed. This can be seen in the collocation of ‘experience’ with both males and females, where Rodger talks about women as controlling which men get to have sexual experiences, and can also be seen in the semantic collocates ‘undeserving’ and ‘able/intelligent’ which show Rodger representing women as deeming him ‘undeserving’ and other men as ‘able’ to have experiences which he cannot. This is in contrast to the sexism found in previous research and represents what is often referred to as a ‘new misogyny’. I suggest that this is the key difference between Rodger’s (ultimately murderous) views and sexist views rooted in traditional patriarchy. 

Tweet all about it: Public views on the UN’s HeForShe campaign for gender equality by Róisín Knight

On 20th September 2014, Emma Watson gave a speech through which she formally launched the UN Women’s HeForShe campaign. She claimed that “no country in the world can yet say they have achieved gender equality” and asked men to be “advocates for gender equality”- to be the “he” for “she” (UN Women, 2014). In light of this speech, the purpose of this study is to investigate the public reaction to the campaign.

The majority of discourse analysis previously applied to the study of gender and language has been qualitative and based on relatively small amounts of data; there are clear advantages offered by combining discourse analysis with corpus linguistics methods (Baker, 2014: 6). Additionally , researchers have begun to look towards conversations shared online to explore public views. For example, Potts et al. (2014) and Tumasjan et al. (2010) use data from Twitter in order to explore a wider range of public ideologies. This study combines these two new approaches, through carrying out a corpus-assisted discourse analysis of views expressed on Twitter about HeForShe.

I created a corpus of tweets containing the hashtag #HeForShe. Through comparison to a reference corpus of a random collection of tweets, keywords were identified. Following the work of Baker and McEnery (forthcoming), these keywords were grouped based on functional similarities and used to aid the identification of different discourses. Three main discourses were found: the discourse of the HeForShe fight; the discourse of gender and the discourse of Emma Watson. A recurring theme throughout these discourses is that men were frequently presented as more powerful than women.

Exploration of these discourses provides an understanding of how the HeForShe campaign is perceived and presented on Twitter, potentially enabling the organization to make better use of Twitter (see Messner et al., 2013).

Negativity, medicalization and awareness: a corpus-based discourse analysis of representations of mental illness in the British press by Gillian Smith

A topic of recent interest has been the stigmatisation of mental illness. The British press have been accused of perpetuating this, providing the public with negative representations of mental illness based on misguided stereotypes (Bilić and Georgaca, 2007; Nawková et al., 2001; Stuart, 2003; Thornton and Wahl, 1996; Coverdale et al., 2002). Studies in this area, however, are often small-scale and psychiatrically-based, failing to address the linguistic manifestations of discourses. This paper presents a corpus-based analysis of representations of mental illness in the British press between 2011 and 2014, aiming to broaden the scope of earlier works, using a larger, more representative sample and a discourse approach focussing on mental illness’ portrayals in UK newspapers.

Keywords within the corpus created revealed the central themes of mental illness newspaper articles, indicating a focus upon medicalization, violence and severity and suggesting that the tone of press discussions of mental health are largely negative. In order to look at specific press constructions of ‘mental illness’, collocates of the term were grouped according to semantic preferences, which were subsequently used to identify key discourses surrounding the term. Again, the major discourses revealed centred upon medicalization and negative stereotypes, including violence and addiction.

These findings highlight that press representations of mental illness are considerably negative, which in turn perpetuates the stigma surrounding mental illness, as the press’ misrepresentations are the predominant source of public information. However, a minor discourse revealed by the corpus, awareness, concerns press discussion of the need for wider understanding mental illness and prejudiced attitudes and suggests that, whilst the press portray mental illness in discriminatory ways, they attempt to change public opinion. It may be suggested, however, that for the press to raise full awareness, they first must address their own stigmatizing representations.


Towards Corpus-driven History of Contemporary Islamic Political Discourse in Turkey and Bosnia

Next month, CASS will welcome visiting researcher Dino Mujadzevic. Read more about his project in his own words, below.

As a visiting researcher during February and March 2015 at the ESRC Centre for Corpus Approaches to Social Science (CASS), I am looking forward to widening my knowledge on corpus-driven methods in order to integrate more empirically-grounded methodology into my research of contemporary media and political discourses in Turkey and Bosnia. As the leading research centre focussing on the interdisciplinary corpus-driven research of the language in the social context, CASS was a natural choice for seeking theoretical and practical consultation, as well as assistance in the more technological aspects of carrying out a corpus-driven study. I was also attracted to the openness of CASS towards applications of corpus-driven methods to the study of history (which I consider to be my core discipline), as well expertise on topics related to Islam.

Since February 2014, I have worked as a postdoctoral fellow at the History Institute, Ruhr University Bochum (Germany). There, I am working on a research project entitled “Turkish Foreign Policy and pro-Turkish activism in Bosnia and Herzegovina (2002-2014): Discourse and actors“, which is funded by the Alexander von Humboldt foundation. In this project, I examine the media promotion of Turkey in this country by applying the Discourse Historical Approach to CDA on textual material in Bosnian/Croatian/Serbian (BHS) and Turkish produced by state and non-state pro-Turkish actors, both Turkish and Bosnian Muslim. The academic research on recent Turkish foreign policy and conservative cultural trends has risen in the past years as a reaction to the very active, influential and visible Turkish involvement on the world stage, mostly in the Balkans and the Middle East. Bosnia and Herzegovina, with its large Muslim population and its legacy of recent war, has a special symbolical importance for the ruling political party. The systematic study of the discourse which drives the Turkish official and non-official foreign policy coordinated by the government is still in its early stages.

During my fieldwork research stay in Sarajevo in summer of 2014, I started collecting textual material on Turkey in Bosnian media since 1990s. Additionally, in order to clarify the background of this material I carried out numerous interviews with persons active in pro-Turkish and/or Islamic groups promoting Turkey and participated in public events and religious ceremonies.

Due to very large amount of available media related to the research subject and possibility of more comprehensive quantitative backing of conclusions, I decided to upgrade my CDA research by applying the corpus-driven approach. Currently, I am building a corpus of pro-Turkish digitalized texts from Bosnian media (in BHS and Turkish languages), collected from private digital media collections, the Internet and by scanning the newspapers.

I plan to segment the corpus into chronologically delimited corpora and to extract keyword nouns and their semantic fields (KWIC, collocations, word-clusters) from each one of these corpora.  The extracted data would be used to analyse changes (or continuities) in discursive practices in the pro-Turkish discourse in Bosnia since 1990s. Assistance for this task should be provided by network visualizations (e.g. networks of keyword’s collocations). Because I am still in the initial phase of acquiring technical and methodological knowledge related to corpus linguistics, I started a smaller pilot project to try out the corpus-driven approach. I collected all Turkish Prime Minister Erdogan´s speeches (2003-2014), interviews and other statements in both in English and Turkish which were available online. Currently, I’m writing a paper on the incorporation of Islamic references in his political discourse which I plan to analyse by using AntConc tool on the chronologically divided corpora of Erdogan political statements. The major problems I am facing in scope of my pilot project include building a representative reference corpus and lemma lists for Turkish.

My stay at the LU is funded by European Research Stay Programme of Alexander von Humboldt Foundation.

Are you interested in being a visiting researcher at CASS? Email us at cass(Replace this parenthesis with the @ sign) with details about your project and your proposed time and duration of stay for more information.

New CASS Briefing now available — What words are most useful for learners of English?

CASSbriefings-EDLWhat words are most useful for learners of English? Introducing the New General Service List. Learning vocabulary is a complex process in which the learner needs to acquire both the form and a variety of meanings of a given vocabulary item. General vocabulary lists can assist in the process of learning words by providing common vocabulary items. In response to problems identified in the currently available General Service List, the authors decided to investigate the core English vocabulary with very large language corpora using current corpus linguistics technology.

New resources are being added regularly to the new CASS: Briefings tab above, so check back soon.