Representing trans people in the UK press – a follow-up study

I do not identify as trans, nor did I carry out this research for profit or because I am an activist. I approached the subject from the position of allowing the data to speak for itself, and the corpus methods I use rely on computational techniques that are unbiased – computer software identifies the most frequent words, phrases and combinations of words, which then have to be accounted for by the analyst.

Introduction

A few years ago I published the “corpus linguistics” chapter in an edited collection relating to different methods of carrying out critical discourse studies. As a case study for the chapter, I decided to look at the representation of trans people in the British press. At the time there had been a disapproving article about a trans person who was also a school-teacher in The Daily Mail who had committed suicide three months later, while another article published in the Observer, one of the more respectable Sunday broadsheet newspapers, had used pejorative phrases about trans people like ‘a bunch of bed-wetters in bad wigs’ and ‘screaming mimis’. I wanted to use corpus approaches to see whether these articles were typical of the general press discussion around trans people or whether they stood out as unusually harsh. I built a (small by corpus linguistics standards) corpus of around 900 articles, just from 2012 and used traditional corpus methods (keywords, collocates, concordancing) to examine a range of words like transgender, transsexual and trannie. My analysis found that the two articles mentioned above were at the extreme end of a continuum, although:

“the analysis did find a great deal of evidence to support the view that trans people are regularly represented in reasonably large sections of the press as receiving special treatment lest they be offended, as victims or villains, as involved in transient relationships or sex scandals, as the object of jokes about their appearance or sexual organs and as attention-seeking freakish objects. There were a scattering of more positive representations but they were not as easy to locate and tended to appear as isolated cases, rather than occurring repeatedly as trends.” (Baker 2014)

I was recently approached by the charity Mermaids UK who asked me if I would carry out an updated analysis of more recent press representation. This time I collected data from the previous 2 years (21 October 2017 to 21 October 2019), resulting in a larger corpus of around 6,400 articles, indicating that there were around 3 and a half times as many articles written about trans people in this later period. In terms of news values, trans people are seen as rather more newsworthy these days. So has the discourse around them changed?

Changing Labels

In terms of how the press refer to trans people, in 2012, the most common term by far was transgender. In 2018-19, transgender and trans were about of equally frequency, this being mostly an effect of the Guardian and Observer showing a strong preference for trans. Terms I had expected that would have died out, like sex-change and transsexual, had decreased somewhat but were still being used about once every other day, with the Mail, Telegraph and Times making the bulk of such cases. Another decreasing term, tranny occurred about once a fortnight. In 2012 it was used to imply bad taste, outlandishness, sex romps or the subject of jokes. The term was a particular favourite term of journalist AA Gill (who used it in bizarre ways like tranny panto and tranny centaur night out). However, in 2018-19 it was now mainly acknowledged as a bullying term (AA Gill died in 2016). The rather jarring use of transgender(s) as a noun (“How about One Guy, A Girl, A Transgender and Two Nonbinary Persons” (The Sun)), occurred 37 times in 2018-19 (there was only 1 such usage in 2012).

Collocates of trans(gender)

Examining the contexts that trans and transgender people were written in showed one of the most notable changes though. I’d noted in 2012 that transgender people were implied to be quick to take offence – in that year there were 8 cases of trans(gender) co-occurring with words like angry, clash, complaint, fury, offended, outrage, row, spat, upset and wrath. There were enormous increases of this representation in 2018-19 though – 586 cases. While a small number of these cases don’t attribute trans people as being the ones who are cast as angry or complaining, the vast majority do – and the wider point is that trans people are being discussed as being at the centre of controversy. A similar set of words which relate to conflict including aggressive, demand, harassed, bullied, confronted, lunge, militant, outspoken, pressure and threat saw a similar pattern – 5 cases of these kinds of words appearing near trans(gender) in 2012, but 334 cases in 2018-19. The result is that trans people are constructed as newsworthy because they are difficult, angry, easily offended (and often unreasonably so).

Scout leaders have been told to avoid referring to children as boys and girls to ensure transgender members are not offended. (Mail on Sunday)

A transgender woman is demanding an apology and £2,500 compensation after claiming she was called “sir” by rail company staff. (Times, March 16, 2019)

It’s not a new representation. I saw the same thing when I looked at news stories about gay people in the early 2000s, Muslims in the 2000s and feminists in the 1990s and 2000s. Another representation (also used on gay people) was to link trans people with crime, connecting them to words like killer, prisoner, lag, criminal, murderer, rapist, jail and kill. These words occurred with trans(gender) 3 times in 2012, but 608 times in 2018-19.

It’s crazy to give trans prisoners everything they say they want,’ said chair Janice Williams. Why wouldn’t they lie in the circumstances? (Daily Mail)

Women’s jail holds trans lag born lad (The Sun, September 13, 2019)

Some of the trans brigade advocate the murder of Terfs as the best course. (Telegraph, 12 January 2019)

Transphobia, trans children and the trans lobby

What about more general contexts? What topics are trans people talked about in relation to more, or less these days? Here we see potentially a change for the better. Topics that now take up less space in the overall debate involve references to transvestites and ladyboys as well as discussion of implants, the clothing worn by trans people and their ability to “pass” as a particular gender. There’s less of the inappropriate prurience in trans people that’s associated with sitcom characters like Alan Partridge. In its place, the biggest area of growth is in stories relating to transphobia and discrimination, although there were also increases in references to transitioning, inclusivity and gender-neutral pronouns.

Lest we think that references to transphobia indicate that the press are overall more concerned about trans people being abused, a closer look indicates this is not always the case. Although such references are 112 times more frequent in 2018-9 compared to 2012, 15% of the 2018-19 mentions put the word transphobia in quotes, implying authorial distance or even rejection of the term.

A transgender teenager who demanded the removal of a female Labour member from her post as women’s officer over her allegedly “transphobic” views has been elected to the post in her local Labour party. (The Times, November 20, 2017)

I took 100 random cases of transphobia and related words like transphobe and looked at them in more detail. Approximately half (47) used the term to raise questions about its validity – either using the distancing quotes, referring to “supposed” or “alleged” transphobia, mentioned the way that the accusers behave: e.g. “howled down as transphobia” or simply baldly stating that something is not transphobia.

An analysis of the term trans(gender) children found a slightly better picture. That term doesn’t occur in the distancing scare quotes – so the concept of trans(gender) children appears to be more accepted in the press than the concept of transphobia. An analysis of 100 random cases found 56 that accepted the existence of trans children and/or advocated that they should receive support. Thirty seven cases were more disapproving, either suggesting that children who identify as trans should not be supported in transitioning or that efforts to support them (e.g. through pronoun stickers or gender-neutral toilets) are unnecessary, even unhelpful. A further seven cases appear more neutral, noting that this is an issue which divides people but not clearly coming down on either side. It’s very rare to find voices of trans(gender) children in these press articles.

A final change relates to the increase in the phrase trans(gender) lobby. There were no mentions of this phrase in 2012. In contrast, 2018-19 saw 151 mentions of it, with over 90% of such cases writing about it in a negative way (e.g. as silencing debate, peddling politically-correct fallacies, being deranged or aggressively militant). The transgender lobby is described in somewhat contradictory terms across the press. At times, journalists go out of their way to stress that it is unimportant, referring to it as miniscule and doomed, yet at other times it is described as powerful, hegemonic and influential (with the implication that it should not be these things).

Conclusion

The UK press wrote over 6,000 articles about trans people in 2018-19. On the surface there appear to have been improvements – the more sexualising and joking uses of language around trans people have reduced since 2012 and there are many more stories around transphobia and inclusivity. However, there are large swathes of the press which write about these topics in order to be critical of trans people and many articles which consequently paint trans people as unreasonable and aggressive. The picture suggests that the conservative press and most of the tabloids have shifted from an openly hostile and ridiculing stance on trans people towards a carefully worded but still very negative stance.

Reference

Baker, P. (2014) ‘”Bad wigs and screaming mimis”: Using corpus-Assisted techniques to carry out critical discourse analysis of the representation of trans people in the British press.’ In C. Hart and P. Cap (eds) Contemporary Critical Discourse Studies. London, Bloomsbury: 211-236.

Time to Celebrate: Trinity Lancaster Corpus

On Wednesday 30 October, The ESRC Centre for Corpus Approaches (CASS) organised a small get-together in its new location, Bailrigg House, to celebrate the research that is being carried out at the centre. Specifically, on this occasion, we wanted to highlight the Trinity Lancaster Corpus, a corpus of spoken learner English built in collaboration between Lancaster University and Trinity College London.

Cutting the cake with the Trinity Lancaster Corpus logo

We are really proud of the corpus, which is the largest learner corpus of its kind. It took us over five years to complete this part of the project. Here are a few numbers that describe the Trinity Lancaster Corpus:

  • Over 2,000 transcripts
  • Over 4.2 million words
  • Over 3,500 hours of transcription time
  • Over 10 L1 and cultural backgrounds
  • Up to four speaking tasks

A balanced sample of the corpus is now available for online searching via TLC Hub (password: Lancaster1964). To read more about the corpus and its development, check out this article in the International Journal of Learner Corpus Research:

Gablasova, D., Brezina, V., & McEnery, T. (2019). The Trinity Lancaster Corpus: Development, Description and ApplicationInternational Journal of Learner Corpus Research5(2), 126-158. [open access]

A new special issue of the journal featuring articles on various aspects of learner language, which use the Trinity Lancaster Corpus as their primary data source, is available from this link.

Table of contents of the special issue of the International Journal of Learner Corpus Research

A cake to celebrate the Trinity Lancaster Corpus

Celebrations at CASS

Celebrations at CASS (posters featuring research on TLC in the background)

Islam in the Media – A new CASS project working with The Aziz Foundation

We are very pleased to announce that in our next CASS project we will be working with The Aziz Foundation  to examine representations of Islam in the British press. The project will be led by Tony McEnery (Principal Investigator) with Gavin Brookes as Co-Investigator. We are also delighted to announce that Isobelle Clarke will be joining us later this year as Research Fellow on this project – welcome Isobelle! (introductory blog post to follow…).

The aim of this research will be to expand on previous work on this topic carried out by members of the Centre. The project will be methodologically innovative, devising new techniques and adapting existing methods to afford new insights into representations of Islam in the UK and how these vary across different parts of the Country and over time. Specifically, this project will be structured according the following three strands:

  1. Examining press representations of Islam over time. This will involve expanding the University’s existing database of press articles about Islam – which currently represents national news articles up to 2015 – allowing for a comparison of representations of Islam over time, from 1998 to the present day.
  2. Comparing national and regional press representations of Islam. As well as providing insight into what is, in the regional press, an under-researched area of media representations of Islam, this strand will also be able to address hypotheses which suggest that Muslims positively appraise local over national media coverage of Islam and Muslims (Open Society Institute, 2010: 215). In addition to expanding the existing dataset to include articles published up to the present day, as per (1), this strand will entail the expansion of this dataset to include regional (as well as national) newspaper articles about Islam published from 1998 to the present day. By studying temporal changes in both the national and regional press, this project will be able to assess the extent to which any shifts are uniform across both tiers (local/national) or, on the other hand, whether divergences between the two actually become starker over time.
  3. Exploring the social effect of press representations. This strand will analyse how readers respond to both positive and negative framings of Islam in ‘below-the-line’ comments which accompanying the articles in the data. This strand will therefore take a wider view of societal discussions of Islam, comparing readers’ perspectives against press representations in order to ascertain the extent to which such representations might influence but also be challenged by the public. By exploring comments both on articles which contain positive as well as negative representations – as these are identified in (1) and (2) – this strand will provide useful evidence for demonstrating to the media the social effects of constructive journalism over poor journalistic practice.

We will actively engage members of the British Muslim community in our research by sharing our findings with them, listening to their thoughts and feedback, and helping them to read media texts more critically in order to challenge negative representations in the future, for example by formulating complaints that are informed by (corpus) linguistic insight. We are excited to begin this exciting project and are looking forward to working with The Aziz Foundation and, of course, to welcoming Isobelle to Lancaster!

‘Collaborations between Linguistics and the Professions’ event – Three participants’ views

On 4th-6th March 2019, we organised an event on ‘Collaborations between Linguistics and the Professions’. If you missed it, here are three reports from early-career researchers – one for each day.

Day One – by Mathew Gillings

The aim and focus of the Collaborations between Linguistics and the Professions event was to look at connections and opportunities that arise between the academic discipline of linguistics and wider industry. Broadly speaking, the event considered how university-based linguists may be able to advise in both the public and private sectors, providing consultancy to help inform real-world issues. As a PhD student applying corpus methods to the study of deceptive language, the first day of the three-day event was of particular interest to me, due to its focus on forensic linguistics and public engagement.

After a quick welcome and opening of the event from Elena Semino, we started with a talk by Louise Mullany. Louise works at the University of Nottingham, but also carries out linguistic consultancy through the Linguistic Profiling for Professionals unit she directs. Louise discussed how her team has worked within a whole range of sectors and applied linguistic theory to help inform their practice. For example, politeness theory might well provide the answers to why one firm is struggling with their customer service; or perhaps an investigation into gendered talk might reveal some underlying problems or tensions. Perhaps even other methods could provide further insight, such as eye-tracking, or putting clients through an online learning course. Louise’s talk gave a good insight into how such a unit operates.

The second talk built on the first one, with Isobelle Clarke showing us not only what you need to think about and be aware of, but also what you shouldn’t do when trying to build a reputation as a linguistic consultant. Although Isobelle has already had some good opportunities through the connections made throughout her PhD, she argued that her reputation will always be questioned for the stereotypes that come with the territory. For example: she is female, unlike most forensic linguistic consultants; she is from Essex, and therefore has an accent that is often prejudiced against; and she is also still early-career. These are things she cannot change, but still unfortunately affect whether or not she is considered credible as an expert in the field. It was good to hear such an open and frank discussion about inequalities within the field.

Continuing with the forensic linguistics theme, Georgina Brown offered an insight into how new methodologies within forensic speech science are now being used to inform proceedings within the courtroom. Georgina introduced us to Soundscape Voice Evidence, a new start-up based right here in the Department of Linguistics and English Language at Lancaster, which is all the proof you need that there is a real appetite for further collaboration between academia and industry.

Another interesting talk later that day was by Lancaster’s very own Claire Hardaker, who talked about learning when to say ‘no’ to opportunities that come your way. Claire discussed several cases where, due to her own excitement, she may have jumped into a new opportunity too quickly. As a PhD student, it was good to hear advice on how to handle different cases, and especially on how to be careful in picking which cases to pursue. Likewise, this seemed to be a common theme over the three days, with a whole range of attendees discussing issues they had encountered whilst carrying out this kind of collaborative work.

The day came to a close with Tony McEnery’s talk discussing linguistics and the impact agenda. Tony reflected on some of his own experiences working with various agencies outside of academia, but the bulk of his talk concerned impact work against the backdrop of the REF. Tony gave some top tips for how to get your research out there and informing public life through the Civil Service, but also spoke very realistically about the priority it will be given by others. Everyone is busy – those in academia and those outside of it – and we must not lose sight of that. Tony finished with a call to arms: language pervades each and every aspect of our life, and it is clear that the discipline has a lot more to offer than it has traditionally done in the past.

Day one of the Collaborations between Linguistics and the Professions event was enlightening. I, for one, never knew quite how much consultancy work linguists are involved with, and it was refreshing to see such a healthy appetite for it within the room – especially from early-career researchers, like myself, who may well do it in the future.

Day 2 – by Sundeep Dhillon

I attended the education focused sessions on Tuesday, given that my background is in English language teaching and, as a current ESRC doctoral student in Applied Linguistics, I was keen to find out more on collaborations between linguistics and the professions. The day started with a warm welcome from Elena Semino prior to the first presentation. Alison Mackey spoke about her work as a linguistics consultant in the private sector which ranged from educational technology companies to private schools in the USA. Alison gave lots of varied (and humorous) examples of the consultancy work and how she achieved these contracts, which she traced back to three key factors. These were networking and word of mouth referrals, the publication of a book on bilingualism by Harper Collins, and a Guardian article which has over 65,000 shares on Facebook. I was impressed by the range of Alison’s work activities, proving that linguistics can be widely applied to real-life practical contexts.

One of the schools Alison has worked with in the USA, ‘Avenues: The World School’, was then represented by Abby Brody. The private school has an innovative approach to teaching as students are immersed in Spanish or Mandarin (alongside English) with the aim of becoming ‘truly fluent’. The links between linguistics and the school’s curriculum development over time were outlined. It was clear that the school was responsive to research and adapted their teaching and learning practices accordingly.

The next presentation by Judit Kormos was very inspiring in that the linguistics research has led to a direct impact on the way inclusive practices are promoted in educational publishing and second language assessment. Judit’s research on specific learning difficulties and L2 learning difficulties has aimed to give a voice and agency to those who are traditionally underrepresented. There were a number of examples given of working with publishers and government departments to develop strategies and ways of working which are inclusive. The success of Lancaster’s Future Learn MOOC  on Dyslexia and Foreign Language Teaching was also discussed and there is now an opportunity to join the next launch of this MOOC on the 15th April 2019.

Following a lovely long and well catered lunch break, we then heard from Claire Dembry of Cambridge University Press (CUP). Claire outlined the many opportunities for links between the publisher and academic research, including the recent Spoken British National Corpus 2014 (BNC) project in collaboration with Lancaster University. This project involved collecting 11.5 million words of spoken conversation and the BNC 2014 is now available online with free access. There are also opportunities to contribute to articles, books, research guides and white papers which are produced by CUP. Claire also answered questions on practical considerations such as contacting CUP, pitching ideas and negotiating fees, all of which was useful information to consider prior to any collaboration.

We then heard from Vaclav Brezina about corpora and language teaching and learning. There were three main sections in the presentation – accessibility, research partnerships and interdisciplinarity. Accessibility covered the link between theoretical ideas of linguistics and the practical tools and techniques used in projects such as the BNCLab and #LancsBox. Research partnership highlighted the importance of collaboration with others such as CUP and Trinity College London. Finally, interdisciplinarity covered good practice guidelines on working with others including flexibility and collective ownership of goals.

Cathy Taylor of Trinity College London presented about ‘The Spoken Learner Corpus’ (SLC) project collaboratively undertaken by Trinity College London and CASS, Lancaster University. This has involved collecting data from Trinity’s spoken Graded Examinations in Spoken English (GESE) at B1 level and above, leading to the creation of the SLC which can be explored for language teaching and research purposes. Cathy described the stages of the project including the rationale, the practical data collection of audio recorded exams from GESE and also the creation of teaching and learning materials based on the SLC. These materials are available on the Trinity website and cover topics such as managing hesitation and asking questions. This project is a great example of how corpus data can be used to inform and improve the classroom experience of English language learners.

The final presentation was by John Pill, who spoke about his experience of updating the Occupational English Test (OET), an English language test for medical professionals. Collaboration between the test developers, language researchers and medical experts was outlined, including tensions between them in relation to the expected content, assessment criteria and outcomes of the OET. Overall, the process to create a relevant language test which covered English language and also practical medical aspects was successful with an updated test being launched following the collaboration.

Each of the presentations linked the research within linguistics to applications in the wider education profession. There was a lot of useful information and plenty of food for thought for the audience in considering future collaboration activities.

Day 3 – by Joelle Loew

It is the third day of the conference – by now a familiar crowd is coming together around coffee in the morning, and the atmosphere is at ease. People have come from all over the UK and beyond to the beautiful campus of Lancaster University – I had flown in from Switzerland a while ago, where I am doing my PhD in Linguistics at the University of Basel. Everyone is looking forward to the last day, which brings together researchers and practitioners applying linguistics in various professions including media and marketing. We start off with a talk from Colleen Cotter from Queen Mary University of London on bridging ‘the professional divide’ between journalists’ and academics’ talk about language – she outlines journalistic language ideologies but also highlights journalistic audience design and corresponding readership-orientation as an example of how journalistic practice can feed into academic practice. After a quick refill, we gather again to hear Lancaster’s own Veronika Koller discuss her experience of opportunities and obstacles in linguistics consulting in healthcare. Throughout the presentation she refers to and outlines the main stakeholders in healthcare particularly relevant to linguistics:

https://twitter.com/ZsofiaDemjen/status/1103251411268182016

On we then go to hear Jeannette Littlemore from the University of Birmingham discuss her work with marketing and communications agencies on their use of figurative messaging. She focuses on the role of metaphor and metonymy for brand recognition, brand recall and consumer preference, drawing on examples from her research and work with the creative industry. Discussions following her talk continue into the lunch break, refreshed and well fed we move into an afternoon packed with insight from industry. Gill Ereaut brings in the perspective of a linguist working within the professions, introducing her consultancy Linguistic Landscapes. Their work includes evidence-based consulting for organisations on multiple levels, including organisational culture change. Another perspective from industry follows by Sandra Pickering from opento, who talks about the role of language in marketing. She provides a wide array of fascinating examples from her diverse experience with different organisations, and spends some time outlining how brands become metaphorical persons on their quest to build compelling brand narratives. The audience discusses some well-known brand narratives and archetypes of smaller and bigger players in the industry following her talk.

https://twitter.com/VeronikaKoller/status/1103306088953331713

Dan McIntyre and Hazel Price from the University of Huddersfield then present two very different case studies applying corpus linguistics in a private and a public setting with their consultancy Language Unlocked. The day ends with a Skype talk by Deborah Tannen from Georgetown University who captures the audience with her account on why and how she writes for non-academic audiences. Her multiple and diverse experiences of writing for the broader public make for interesting insights on the differences in writing for academics and writing for a lay audience. She emphasizes the value of having to find simple terminology for expressing and simplifying complicated ideas. Her talk was followed by a lively discussion, as were the others in the day. Exploring opportunities and challenges in linguistic consultancy work through discussing hands-on examples from different perspectives allowed highlighting recurrent themes too, such as the importance of considering ethical aspects in this process. It also showed the tremendous potential and relevance of linguistics for a variety of different aspects of the professional world.

In sum, it was a fascinating day and a very inspiring conference overall – throughout the day it was evident that attendees genuinely felt the exchange between academics and practitioners applying linguistics in the professional world was very fruitful, and I am almost certain it is not the last we’ve heard of events such as this! It certainly broadened my own horizon as a PhD researcher looking at professional communication – showing many opportunities and highlighting the challenges to prepare for and navigate when seeking collaboration between linguistics and the professions.

CASS Down Under!

Earlier this month, a few members of the CASS team travelled to Australia to give talks and deliver workshops at two of the Country’s most prestigious universities – Australian National University (Canberra) and The University of Sydney.

Our journey begins in Sydney, where on the 18th March, Paul BakerGavin BrookesTony McEnery and Elena Semino spoke at the Corpus Showcase and launch of the Sydney Corpus Lab. Elena was first to speak, as she presented findings from her research with Alice Deignan (University of Leeds) on metaphors for climate change in the classroom. Tony then spoke about his work studying shifts in the historical discourse surrounding Venereal Disease, before Paul concluded the morning session with a talk which brought together a series of corpus studies of sexual identities in personal ads. Following lunch, Gavin presented early findings from the Representations of Obesity in the News project, using keywords to compare tabloid and broadsheet data. So, a wide range of topics reflecting the diversity of the projects within CASS, which both our hosts and members of the audience commented on and seemed to enjoy. More information and photos from this event can be found here.

CASS Director, Elena Semino with Corpus Lab Director, Monika Bednarek

Monika Bednarek with Paul Baker, Tony McEnery, Laurence Anthony and Gavin Brookes (L-R)

Following the event in Sydney, our next stop was Canberra, where the team was joined by Dima Antansova and Luke Collins for a series of corpus lingustics workshops at the Institute for Communication in Health Care (ICH), Australian National University (ANU).

Day 1 in Canberra: Paul Baker, Dima Atanasova, Shannon Clark (ANU), Luke Collins, Tony McEnery, Elena Semino, Diana Slade (Director of ICH, ANU), Gavin Brookes, Susy MacQueen (ANU) (L-R)

On the first day of our stint in Canberra, the CASS team exchanged details about our work on health communication with members of the ICH, with the view to future collaboration in this area. On days 2 and 3, we delivered a corpus linguistics workshop to approximately 50 delegates from an impressive range of disciplines, including health care, theoretical linguistics, creative writing and anhropology, to name just a few! On the first day of workshops, Tony introduced corpus linguistics, Paul spoke about his research using corpora in discourse analysis and Luke led practical sessions on corpus construction and collocation. On the second day,Tony gave a lecture on stats and Paul led a practical session on using corpus techniques in discourse analysis. Also, on the final day, Elena and Gavin both gave lectures on the application of corpus methods to the analysis of health language data. Elena’s talk focused on metaphor in cancer and end-of-life care, before Gavin wrapped up the sessions with a talk about his work with Paul on NHS feedback in England. Our hosts were welcoming and hsopitable and the attendees were lively, engaged and seemed to gain in confidence as the workshops went on. In summary, this was a thoroughly productive and enjoyable expereince that was more than worth the long journey. Now we just have to find an excuse to go back and find out how everyone is getting on!

 

 

Corpus methods and multimodal data: A new approach

By William Dance, Alex Christiansen and Alexander Wild

Within corpus linguistics, multimodality is a subject which is often overlooked.

While there are multiple projects tackling multimodal interactional elements in corpora, such as the French interaction corpus RECOLA and the video meeting repository REPERE, corpus linguistic approaches generally tend to struggle when faced with extra-textual content such as images. Until now, the only viable approach to including such content in a corpus has been manual image annotation, but such an approach runs into two overarching issues: first, visual modality is the most labour-intensive form of multimodal corpus annotation when performed ‘by hand’ and second, multimodal corpora are often limited in scope and therefore remain very specialist using relatively small datasets.

However, as Twitter and other social media are quickly becoming popular sources of natural text, it is important to recognise that ignoring images means ignoring a large portion of potential meaning. In the worst instances, texts become entirely meaningless without the context supplied by the image – take for example this relatively innocuous tweet about superheroes:

without its image content.

As opposed to with it.

The image in the example above comprises part of the meaning making process and without the image, meaning is lost. Although evidence of the number of posts which include images are scarce, an engagement analysis sampling 1,000,000 posts from Twitter tentatively noted that 42% included an image.

As a step towards fixing this omission, we are introducing a new methodological tool to the corpus linguistic toolbox, tentatively named Visual Constituent Analysis or simply VCA. As the name implies, the approach draws from the concept of grammatical constituencies, presenting images as a series of individual semiotic constituents, which can then be shown in-line with any co-text found in the tweet. Using Google’s Cloud Vision API, VCA seeks to redress the issues raised earlier of scalability and scope by automating the annotation process and consequently widening the research scope, allowing studies to be extended to a much larger portion of multimodal data with very little extra work involved.

In addition to extending the scale of analysis, Vision also supplies information that would otherwise be missed by most annotators. This includes the function called web entities, which retrieves the set of all indexed web-pages using a particular image and extracts the most representative keywords from the context the image was used in. As an example, note that in the sample image below Vision detects only that the image contains a ‘journalist’/’commentator’ and that there is a ‘photo caption’, while web entities highlight that the people in question are Sean Hannity and Mitch McConnel, as well as the fact that the image relates to Fox News and the Speaker of the United States’ House of Congress.

Input

Output

Labels Journalist; Commentator; Facial Expression; Person; Forehead; Photo Caption; Chin; Official
Web Entities Sean Hannity; Mitch McConnell; FOX News; Kentucky; United States Senate; Republican Party; Capigruppo al Senato degli Stati Uniti d’America; Speaker of the United States House; United States Congress; Election; President of the United States
Document “STOP WHINING AND GET TO WORK”

 

While we recognise that there are obvious issues with allowing an algorithm to take over the task of annotating images, we posit that the same issues are inherent to human annotation, perhaps to an even larger degree. Within the traditional annotation method, a human element is required to process the non-textual data by hand, with implications of scale, consistency and knowledge-base. Vision offers vast scalability as well as the web indexing power of Google and consequently can help to analyse large multimodal datasets that would require teams of human annotators to process.

To test the viability of the approach as well as the reliability of the data-labelling supplied by Google’s neural network, we will use VCA to analyse the use of images in hostile-state information operations on social media in Twitter’s recently released Internet Research Agency dataset (T-IRA). T-IRA includes all the users identified by Twitter as being connected to Russian state-backed information operations and measures more than 9 million tweets, including a database of more than 1.4 million images.

This project will test the viability of VCA as a method of corpus construction but will also provide insights into how information operations weaponise images on social media. Using VCA, we will seek to identify the strategies used in T-IRA to try and influence people’s political and social views. Looking at studies of online disinformation as well as linguistic studies of manipulation these strategies will be codified into a typology of online image-based manipulation.

A cognitive scientist’s perspective on taking the CorpusMOOC

Rose Hendricks, a researcher at the Frameworks Institute in Washington D.C., shares her experience of taking the CorpusMOOC:

‘I’m a social science researcher and have been curious for a while how we can learn more about human culture and cognition by looking at large collections of language — so I jumped at the opportunity to take the Corpus Linguistics online course by Lancaster University.

The course had an great mix of videos, readings, and activities, and covered topics in just the right amount of detail. There was enough information to get a good sense of how corpus linguistics methods can be used in a huge range of ways, from addressing questions in sociolinguistics to developing textbooks, dictionaries, and resources for language learners.

Conversations with researchers who use corpus linguistics methods gave us an even deeper sense of the interesting and important topics that benefit from tools to extract patterns from huge amounts of text.

Throughout the course, I came up with many ideas I plan to explore with the methods we learned about, especially #LancsBox, a tool that helps researchers analyze and visualize their language data.

I would recommend this course to people with any level of background knowledge on the topic — there’s something for everyone.’