Mentoring Social Science Researchers in Corpus Methods and Critical Discourse Analysis: Final Symposium at Keele University

Yuze Sha, with Luke Collins

On 25th January, members of the ESRC Centre for Corpus Approaches to Social Science joined colleagues from across universities in the North West to participate in a Symposium, celebrating the work of postgraduate students getting to learn how to use methods from corpus linguistics in their existing research.

In September 2022, Dr Luke Collins (Lancaster University) and Dr Kathryn Spicksley (Keele University) secured funding from the ESRC’s North West Social Science Doctoral Training Partnership, as part of the ‘Methods North West’ initiative, to launch the programme “Mentoring Social Science Researchers in Corpus Methods and Critical Discourse Analysis”. The aim of the programme was to set up a mentoring scheme, pairing postgraduate students based at Lancaster University with expertise in corpus linguistics (and optionally, critical discourse analysis) with postgraduate researchers working in other areas of the social sciences, to provide guidance on developing their understanding and skills in corpus linguistics research.

Five postgraduate researchers from Lancaster University with expertise in the field – Anastasios Asimakopoulos, Kevin Gerigk, Hanna Schmück, Yuze Sha and Yanni Sun – offered their time and expertise to, respectively, five mentees based at one of the universities in the North West Doctoral Training Partnership between September and January.

Despite having no prior training in corpus linguistics, our five mentees confidently presented their work at the Symposium, demonstrating how they had adopted concepts from corpus linguistics and found new ways to approach their study data. Topics for presentations included digital decisions in Higher Education (Fiona Harvey, Lancaster University), the way the UK prison system holds power accountable (Irfan Pandor, Keele University), to the marketising discourse in Speed Awareness course provider websites (Emily Brannen, Keele University). The data sources for corpus construction were also diverse, including interviews (Yi-Fang Chen, Lancaster University) and newspaper articles (Rebecca Page-Tickell, Lancaster University).

Our invited guest speaker was Dr Gavin Brookes, who started proceedings with a talk demonstrating the application of approaches from corpus linguistics to the study of healthcare. Guests also heard from Dr Fabienne Emmerich, the ESRC NWSSDTP Institutional Lead at Keele University and Lecturer in Law, who provided some encouraging observations of the talks and the mentoring programme, recognizing the engagement of our participants and the work that they had been doing with their mentors.

As a mentor, I absolutely enjoyed the mentoring process, working with my mentee, Yi-Fang. Yi-Fang came to the programme with a dataset consisting of 20+ interview recordings, averaging 2 hours in length. Given the amount of data, one of her concerns was for ensuring that she had an objective way to identify a sample that she could be confident was representative of the larger dataset. We established the key questions she wanted to explore in her study and co-designed a corpus-assisted critical discourse analysis approach for her project, outlining the ways that several corpus linguistics methods could be implemented into her study to facilitate her focus on particular extracts and support her in her qualitative analysis.

Yi-Fang was enthusiastic about using corpus methods to help identify statistically prominent features of her dataset, which helped her pick up the thematic topics in a relatively objective way. She then focused on the relevant concordance lines, applying a more detailed, qualitative analysis of the linguistic features and the ideologies behind key features. Collocation analysis using GraphColl (accessed using #LancsBox) also helped Yi-Fang to visualise connections between key language features in the data and do so in the original Chinese language of the texts.

This was the first time the mentoring programme had been carried out and based on the Symposium, we have been encouraged by the interest and application of corpus methods by our participants. We hope that establishing this network is just the beginning for building working relationships between postgraduate colleagues in the North West and sharing the advantages offered by corpus analysis.

#LancsBox X: Innovation in corpus linguistics

CASS has always been associated with innovation in corpus linguistics. Innovation comes in different forms and guises such as the creation new corpora and tools as well as novel applications of corpus methods in a wide range of areas of social and linguistic research. With increasing demands on the sophistication of corpus linguistic analyses comes the need for new tools and techniques that can respond to these demands. #LancsBox X is one of such tools.

#LancsBox X is a free desktop tool, which can quickly search very large corpora (millions and billions of words) which can consist of simple texts or richly annotated XML documents. It produces concordances, summary tables, collocation graphs and tables, wordlists and keyword lists.  

On Friday 24 February 2023, a new version of #LancsBox X has been released. To mark this occasion, we organised a hybrid event, which attracted over 1,300 attendees. This event was co-sponsored by CLARIN-UK. A recording of this event is available above.

Launching #LancsBox X (Margaret Fell LT, Lancaster University)
CASS team supporting the event (others were helping online).
Online support of the event

Celebrating the Written BNC2014: Lancaster Castle event

On 19 November 2021, The ESRC Centre for Corpus Approaches to Social Science (CASS) organised an event to celebrate the launch of the Written British National Corpus 2014 (BNC2024). The event was live-streamed from a very special location: the medieval Lancaster Castle.  There were about 20 participants on the site and more than 1,200 participants joined the event online.  Dr Vaclav Brezina started the event and welcomed the participants from over 30 different countries. After the official welcome by Professor Elena Semino and Professor Paul Connolly, a series of invited talks were delivered by prominent speakers from the UK and abroad. The talks covered topics such as corpus development, corpora in the classroom, corpora and fiction and the historical development of English.

The BNC2014 is now available together with its predecessor the BNC1994 via #LancBox X.

#LancsBox X interface
#LancsBox X interface

More information about the design and development of the Written BNC2014 is available from this open access research article:

If you missed the event, we offer the recording of the individual sessions below. You can also view the pdf slides about the Written BNC2014.

Online programme: Lancaster Corpus Linguistics
Vaclav Brezina, Elena Semino, Paul Connolly  (Lancaster University): Welcome and Introduction to the event
Tony McEnery (Lancaster University): The idea of the written BNC2014
Dawn Knight (Cardiff University): Building a National Corpus:  The story of the National Corpus of Contemporary Welsh
Vaclav Brezina and William Platt (Lancaster University): Current British English  and Exploring the BNC2014 using #LancsBox X
Randi Reppen (Northern Arizona University): Corpora in the classroom
Alice Deignan (University of Leeds): Corpora in education
Dana Gablasova (Lancaster University): Corpus for schools
Bas Aarts (University College London): Plonker of a politician NPs
Marc Alexander (University of Glasgow): British English: A historical perspective
Michaela Mahlberg (University of Birmingham): Corpora and fiction
Martin Wynne (University of Oxford): CLARIN – corpora, corpus tools and collaboration
Vaclav Brezina Farewell

Talking Health Online

On 21 October 2021, the ESRC Centre for Corpus Approaches to Social Science hosted a webinar entitled, “Talking Health Online: Why it matters and what linguistics can contribute”, as part of a series of events organised by the International Consortium for Communication in Health Care (IC4CH). The IC4CH is an initiative that brings together language and communication researchers and health care practitioners at an international level, to translate the findings of interdisciplinary research to improve healthcare practice. The Consortium includes members from the Australian National University, Nanyang Technological University, Lancaster University, University College London (UCL), the University of Hong Kong and Queensland University of Technology. Like the Consortium, this webinar event brought together colleagues from around the world, with speakers from Lancaster University, UCL and Nanyang Technological University.

The webinar centred on online forms of health communication, particularly online forums, and featured a range of perspectives from scholars at different career stages. Delivered as a conversation between our chair, Professor Tony McEnery, and our respective speakers, attendees had the opportunity to hear about a range of projects involving linguistic analyses of health care communication.

The first of our speakers, Prof Joanna Zakrzewska, is a practicing consultant trained in oral medicine and an honorary professor at UCL. Joanna specialises in a condition called trigeminal neuralgia, a severe pain condition affecting the face, and talked about her work with the Trigeminal Neuralgia Association establishing various support services, including an online forum. Working with Professor Elena Semino, Joanna and her team were able to get a clearer understanding of the types of interactions that were taking place on the forum and identify areas where there was a need for input from medical professionals. Subsequently, the forum has functioned as a source of quality information regarding trigeminal neuralgia, as well as a space for users to find empathy and compassion among peers.

Joanna’s case study indicated what kinds of insights are afforded by linguistic analysis and, in particular, corpus linguistics. Our next speaker, Dr Tara Coltman-Patel, offered further details on what linguistics can contribute and what is involved in corpus linguistic analysis. Tara is a Senior Research Associate in CASS, working principally on the Quo Vadis (Questioning Vaccination Discourse) project, which involves investigating social media, parliamentary discourse and news media, alongside forum interactions. Tara emphasised the evidence-based approach that computational analyses of large datasets affords and, in detailing some of the procedures involved in corpus analysis, demonstrated how researchers can uncover linguistic strategies used for rhetorical effect in discussions around health issues such as vaccination.

Next to speak was Professor Elena Semino, Director of CASS, who offered further details on the analytical approach used to investigate the trigeminal neuralgia support forum. Elena has strong research expertise in studying metaphor and was also able to provide examples of her work developing the Metaphor Menu for people living with cancer. Responding to long-standing debates about the impacts of conceptualising experiences of cancer as, for example a ‘battle’ or a ‘journey’, Elena’s research team found that people respond differently to such metaphors: that while one person can find the idea of preparing for ‘battle’ empowering, this framing can be highly detrimental to those who feel it can be a battle lost. The Metaphor Menu is a resource, in the form of a leaflet and postcards, that presents patients and practitioners with a range of metaphors used by patients to describe their experiences with cancer and is recommended by Cancer Research UK. As with a restaurant menu, patients have the opportunity to adopt the framing of their choosing, or indeed create new ‘recipes’, that help them to view their situation in more empowering ways.

The conversation continued with a focus on the patient experience, with questions directed towards Dr Gavin Brookes, a Research Fellow in CASS who offered some reflections on his work exploring Patient Feedback provided to the UK’s National Health Service (NHS). In this work, Gavin was able to investigate the combination of quantitative metrics (feedback scores) and qualitative comments (free text responses). One of the challenges of the study was establishing what type of respondents were providing feedback, with very little information about personal characteristics such as age, gender, where they were from etc. The solutions developed by Gavin and his collaborators in extracting such information from the data they had available also generated insights into how respondents would disclose personal characteristics as an argumentative strategy. Furthermore, Gavin recounted some of the observations they were able to provide to the NHS, to better understand the feedback form itself and the nature of the responses they received.

Our final speaker was Professor May O. Lwin, Chair and President’s Chair Professor of Communication Studies at Wee Kim Wee School of Communication and Information, Nanyang University. One of the many areas of research May has been involved in is the study of public communications during epidemics. May recounted some of the observations she and her team have made of conversations on Twitter, in relation to Covid-19. Using a technique called sentiment analysis, May was able to track references to emotional states and assess the trajectories of various communities around the world as the pandemic developed. May told us how fear and then anger were dominant emotions expressed on Twitter, but that there is also evidence for expressions of hope and gratitude as members of those communities look to support each other. May’s work demonstrated the influence that public communications from the government had on the overriding sentiment of conversations on the topic, and so it is important to think about the language used in those announcements and how they shape the public mood.

Our speakers then took questions from the audience, providing a view of what is involved in accessing and securing online forms of health communication data, in collaborating with practitioners and in working with large and diverse datasets. This part of the discussion again reiterated the value of interdisciplinary work and, in fostering that interdisciplinarity, working to make your research accessible and finding common ground. In this respect, the webinar echoed one of the core values of the International Consortium for Communication in Health Care: bridging the divide between academic and practitioner worlds based on a shared commitment to understanding and improving health communication.

You can watch the video here: Video

You can view the transcript here: Transcript

Blamed, shamed and at-risk: How have press representations of obesity responded to the COVID-19 pandemic?

There’s no question that all of us within society have been impacted in one way or another by the ongoing COVID-19 pandemic. However, it’s also the case that the health and wellbeing of certain groups have been particularly affected. A review of evidence on the disparities in the risks and outcomes of COVID-19 carried out by Public Health England suggests the virus has ‘replicated existing health inequalities and, in some cases, has increased them’. One group at particular risk of experiencing serious complications from COVID-19 is people living with obesity. Another report from Public Health England, reviewing evidence on the impact of excess weight on COVID-19, concluded that ‘the evidence consistently suggests that people with COVID-19 who are living with overweight or obesity, compared with those of a healthy weight, are at an increased risk of serious COVID-19 complications and death’.

In this paper, published in Critical Discourse Studies, Gavin Brookes explores how British print media representations of obesity have responded to the pandemic. The study, which is the latest in the CASS project exploring representations of obesity in the British press, is based on purpose-built corpora representing UK broadsheet and tabloid coverage of obesity during the pandemic. The analysis involved the use of keywords which were obtained by comparing each corpus against two reference corpora: one representing general press coverage of COVID-19 and the other representing general coverage of obesity in the six months leading up to the start of the pandemic. In this way, the study could account for keywords (and attendant discourses) that were characteristic of press representations of obesity during the pandemic relative not only to general coverage of obesity, but also general coverage of the virus.

Compared to this more general reportage, both broadsheet and tabloid reporting of obesity during the pandemic was found to be more fatalistic, with people with obesity being particularly likely to be construed as dying, or at least as being at heightened risk of dying, from the virus. For the broadsheets, this is a marked change in tone, with the pandemic seemingly ushering in a more pronounced focus on the connection between obesity and mortality. While such fatalistic discourses are characteristic of tabloid coverage of obesity in general, it seems that this way of framing obesity has gained even more prominence in these newspapers during the pandemic.

People with obesity were also depicted as a strain on an already-overburdened NHS, for example by taking up hospital beds and requiring oxygen therapy to the extent that this need creates a shortage for the rest of the population. The solution put forward (particularly by the tabloids, and to a lesser extent the right-leaning broadsheets) is for people with obesity to lose weight, for instance through exercise and supplements, in order to ‘save’ the NHS. This results in the responsibilisation of people with obesity, both for ensuring their own health and that of the wider public. This includes being responsible for ‘saving’ the NHS, though notably the damage endured by the NHS at the hand of austerity politics over the last decade is, conveniently, elided.

The link between obesity and coronavirus affords the press means by which it can maintain the newsworthiness of obesity in the context of what is, in COVID-19, a news story of global relevance. Meanwhile, the fatalistic and responsibilising depictions allow news agencies to key into the news value of negativity. Yet, discourses of personal responsibility are often criticised because they typically fail to grasp that obesity (along with other so-called ‘lifestyle’ conditions) is not simply the outcome of individual lifestyle choices, but likely results from a variety of factors (both individual and socio-political), over which individuals often have limited control. When the newspapers offer a public figure as privileged and powerful as Boris Johnson as a ‘role model’ for readers wanting to lose weight, they risk overlooking the influence of factors such as social privilege in the development of obesity.

When individuals and groups are blamed for problems in society, the result is the creation and propagation of stigma. The way much of the press has reported on obesity during the pandemic represents a ‘ramping up’ weight stigma, with people with obesity not only being blamed for their own health challenges but also shouldering responsibility for problems with the NHS against the backdrop of the most severe public health crisis of modern times. The weight stigma that results from this kind of blame-loading may engender further negative attitudes towards people with obesity, resulting in internalised shame. Yet the consequences of weight stigma may also be intensified by the circumstances surrounding the pandemic, which have already adversely affected the population’s mental health. Meanwhile, the aforementioned report by Public Health England stated that ‘stigma experienced by people living with obesity, may delay interaction with health care and may also contribute to increased risk of severe complications arising from COVID-19. It’s not all doom-and-gloom, though, as the pandemic also seems to have given rise to other, less stigmatising, changes to the press’s approach to obesity. For example, the broadsheets, and to a lesser extent the tabloids, also focussed more on race-related health disparities compared to in usual coverage of obesity. Meanwhile, the right-leaning tabloids offered otherwise uncharacteristic criticism of the UK Government, in particular for its ‘Eat Out to Help Out’ scheme, which we presented as being hypocritical by encouraging people to eat out on the one hand while imploring them to lose weight on the other. From the perspective of promoting more balanced obesity coverage, which cuts across political allegiances, this could be viewed as an encouraging sign. However, it remains to be seen whether this, along with the other changes to press discourse ushered in by the pandemic, will be lasting or particular to this unique and unprecedented news context.

Tara Coltman-Patel – Introductory Blog

This image has an empty alt attribute; its file name is image-1024x1024.png

My name is Tara Coltman-Patel and I am so excited to be a new member of CASS.

I am working as one of the Senior Research Associates on the ESRC-funded Quo VaDis project: Questioning Vaccination Discourse: A Corpus-Based Study project, which explores discussions about vaccinations in UK parliamentary debates, UK national newspapers and on the social media sites, Twitter, Reddit and Mumsnet. Using a variety of corpus tools and techniques, we will aim to gain a better understanding of the wide spectrum of pro-, anti- and undecided views surrounding vaccinations. Analysing how vaccinations are discussed across a variety of contexts, how the different views are communicated, and how people with different views interact, particularly on social media, will be an invaluable tool for addressing vaccine hesitancy. With our results we aim to inform, facilitate and help design future public health campaigns about vaccinations. As vaccinations are a salient topic, especially given the time we are currently living through, I am extremely grateful to have the opportunity to work on this research.

Before joining CASS I was working at Nottingham Trent University, where I recently finished my PhD which focussed on weight stigma and the representation of obesity in the British Press. In doing so I explored how metaphors can sensationalise and dehumanise people with obesity, I explored how science is recontextualised and misrepresented, and I explored the linguistic strategies of representation used in personal stories about weight loss. I am currently in the process of turning that research into a book titled ‘(Mis)Representing Obesity in the Press: Fear, Divisiveness, Shame and Stigma’, which will hopefully be published towards the end of 2022. Weight discrimination is a topic I am incredibly passionate about and in addition to research I have also worked as an anti-weight discrimination advocate and have consulted on global campaigns with the World Obesity Federation.

Outside of research I am a massive book worm and I love to read, I’m obsessed with RuPaul’s Drag Race and I’m also a sucker for a nice beer garden. Before Covid I loved to travel and have backpacked around Australia, Thailand, The Philippines, Mauritius and South Africa. I have some amazing and memorable moments from those trips, from bad ones like falling off a (small) cliff in Mauritius and being bitten on my hand by a spider in Australia, to incredible ones like canyoneering in The Philippines and swimming with sharks in Australia and South Africa. Sharks are my favourite animal and I have a plethora of fun facts about them ready to share at any given moment, so you definitely won’t regret inviting me to parties …

To conclude, I’m really thrilled to be a part of CASS and the Quo VaDis project, and as I have run out of interesting things to say about myself, I’ll end this blog post here.

British Muslims Caught Amidst FOGs – A Discourse Analysis of Religious Advice and Authority

By Usman Maravia

In this blog entry, I will provide an overview of my latest article which explores the writing style of Islamic advice texts on COVID-19. The issues that were addressed in these advice texts were related to the topic of mosque closures, funerary rites, fasting during Ramadan, and suspending Friday and daily prayers to help curb the spread of COVID-19. These texts were being circulated in the UK in March and April of 2020, a crucial period wherein information was passed on to address issues that, in the scope of the study, British Muslims would face in Ramadan, which began on 25th April 2020.

The context

My interest in this topic was sparked by an unfortunate COVID-19 related death of an elderly Muslim from Walsall. A family member of the deceased stated in the Press that “It is imperative that we learn from this tragic loss and comply with Government guidelines to save lives”. What further caught my interest was that if the aim of the Islamic advice documents was to help Muslims stay safe during the pandemic, a unified and standardised message with collaboration between Muslim faith leaders and health professionals would have been helpful. Instead, a range of documents were found to be circulated as well as these documents differed in their titles – leading to ambiguity of exactly what preventative British Muslims were to take and where exactly lay the authority.

Moreover , the titles of these documents differed. Some were titled fatwa, which is a non-binding legal opinion of an Islamic legal expert, but still a document that could potentially carry much influence on Muslim communities in the UK. Some documents were written by healthcare professionals and were titled guidance documents – I wondered, do these documents carry the same weight as fatwas? And yet other documents were neither titled a fatwa nor guidance but in a hybrid style of the two categories, again I wondered, why were these words used in the titles?

The FOG corpus

As such, I sought to identify a) the underlying reasons behind the titling of the documents; and (b) the construction of discourses in the documents. In collaboration with my colleagues Zhazira Bekzhanova (Astana IT University, Kazakhstan), Mansur Ali (Centre for the Study of Islam in the UK, Cardiff University), and Rakan Alibri (University of Tabuk), we collected a total of 76 texts that were available online on websites of British mosques, Facebook pages and other online venues. We found that of these 76 documents, 14 documents were clearly titled fatwa. We also found that six documents were titled guidance documents, and an eye-catching 56 documents, which we refer to as other documents, included a range of words in their titles such as analysis, clarification, confirmation, guidelines, method, pathway, permissibility, plan of action, points, recommendation, response, ruling, and statement. This classification led to our jocular acronym FOG i.e., fatwas, other documents, and guidance documents. This compilation then led to the creation of the specialised FOG corpus consisting of around 110,000 words.

We examined these written electronic texts in the social context of Muslims and COVID-19 in the UK. We explored the way language was used in real-life in fatwas, guidance documents, and other documents. We then focused on the way the authors of these documents differ in their writing styles to create a certain impression on the audience by increasing, in Bourdieu’s terms, symbolic capital. Moreover, we focus on representation of social actors (van Leeuwen, 1995) in deciphering power relations across the FOG documents. Moreover, references to social actors are widely analysed and interpreted across the FOG documents. Other than text producers of these documents, the audience’s references are also analysed, explained, and interpreted through the prism of authorities.

Corpus methods

We applied corpus-assisted critical discourse analysis, which helped us to uncover important patterns in relation to FOGs. Using AntConc software, we analysed the frequency of words, word lists, lexical bundles, collocations, concordance plots, and concordances to detect linguistic patterns in the FOG corpus. Corpus methods also assisted us with the tools to detect power hierarchies and inequalities within the texts. Moreover, our corpus-assisted study strengthens Brookes and McEnery’s study, that texts do acquire symbolic capital through an accumulation of patterns of textual cohesion and rhetorical strategies. We found that the documents appear to follow an underlying hierarchy among British Muslim scholars.

Findings

To elaborate, a particular writing style can be found across the FOG documents. We found fatwas and guidance documents to be textually diametric, whereas other documents were found to feature greater intertextuality as well as maintaining respect to the authority of muftis and their fatwas, but with reservations. The fatwas were found to be written by senior muftis and contained important references to the Qur’an and Muhammad, the Prophet of Islam. Fatwas also included legal terminology in Arabic related to Shariah law. Moreover, fatwas contained phrases such as ‘according to’ and ‘Allah knows best’.

Such a writing style is in accordance with the traditional writing style of fatwas and thereby holds higher symbolic capital. On the other hand, guidance documents were produced by healthcare professionals and did not contain such theologically related phrases but rather relied on scientific and medical language. Interestingly, we found the other category of documents to be written in a hybrid-style of fatwas and guidance documents. Such a writing style appears to increase the symbolic capital of these documents as well as it empowers the writers to challenge existing fatwas – whilst maintaining respect for senior muftis.

While the FOG documents reveal that multiple voices are welcome in addressing a national emergency, we recommend that a standardisation of documents, issued in collaboration with the NHS and senior muftis, could perhaps give a clearer action plan for British Muslims in future. As such, this study is intended to give an impetus to social scientists to explore the discourse of British Muslims and COVID-19 through a linguistic lens.

Our article is available to read in MDPI’s open access journal Religion. Additionally, further research is being carried out on the topic of COVID-19 by the British Islamic Medical Association’s (BIMA) as part of ‘Operation Vaccination’.

For my article on addressing vaccine resistance from an Islamic perspective, please read Vaccines: religio-cultural arguments from an Islamic perspective published by JBIMA.

‘Face masks’ and ‘face coverings’ in the UK press during the Covid-19 pandemic: Scottish vs. national newspapers

Carmen Dayrell, Isobelle Clarke and Elena Semino (Lancaster University)

1 Introduction

Since the beginning of the Covid-19 pandemic, the use of face masks or face coverings as a means of reducing the transmission of the virus has been a major area of debate in many countries around the world. In the UK specifically, the first nine months of 2020 saw a rapid change from a view of face masks as a medical piece of PPE that would not be appropriate or acceptable for the general population, to the establishment of non-surgical face coverings as a recommended public health measure in indoor public spaces, such as buses and supermarkets. As with other aspects of the response to the pandemic, during that time there were differences in the approach to face masks/coverings between the Scottish devolved administration and the Westminster government.

Table 1 provides a timeline summary of policy decisions concerning face masks/coverings on public transport, shops and schools in Scotland and England. For the most part, in Scotland face coverings were recommended or made mandatory earlier than in England. They are also mandatory in corridors and communal areas in Scottish schools, whereas in England this is at the school’s discretion.

 Public transportShopsSchools
April(28th) Scotland (recommended)(28th) Scotland (recommended) 
May(11th) England (recommended)(11th) England (recommended) 
June(15th)England (mandatory) (22nd) Scotland (mandatory)  
July (10th) Scotland (mandatory) (24th) England (mandatory) 
August  (31st) Scotland (mandatory in corridors and communal areas)
September  (1st) England (school/college discretion in indoors communal areas)
Table 1 – Timeline of policy decisions about the wearing of face coverings by the general public in Scotland vs. England.

Scotland has also had a lower incidence of Covid-19 than England. According to official UK government data, as of 30th December 23 people per 1,000 had had at least one positive Covid-19 test in Scotland, in contrast with 39 people per 1,000 in England.

This blog post is concerned with references to face masks and face coverings in Scottish vs. national UK newspapers between December 2019 and August 2020, that is from the start of reports about a new type of pneumonia in Wuhan, China, up to the beginning of the 2020-21 school year in the UK.

2 Research questions

Overarching research question

How does press reporting on face masks and face coverings in Scotland compare with national UK reporting between December 2019 and August 2020?

Specific research questions

  1. How did the frequency of use of ‘face covering(s)’ vs. ‘face mask(s)’ change over time in Scottish vs. national press reporting?
  2. Were there any statistically significant differences in the relative frequencies of the use of ‘face mask(s)’ and ‘face covering(s)’, and of terms relating to places where face masks/coverings may be used, in Scottish vs. national press reporting?
  3. What are the differences and similarities in the collocations (co-occurrence of words) of ‘face mask(s)’ vs. ‘face covering(s)’ in Scottish and national press reporting?

3 Findings in brief

Finding 1 – Over time, ‘face covering(s)’ became more frequent than ‘face mask(s)’ in the Scottish press, but not in the national press.

Finding 2 – ‘Face covering(s)’ are mentioned much more often, relatively speaking, in the Scottish press than in the national press, alongside other terms for public indoor environments where they may be worn.

Finding 3 – Face ‘mask(s)’ and ‘covering(s)’ have partly different collocates, reflecting differences in status and associated narratives.

4 Data

The news aggregator service LexisNexis was used to collect articles that contained either the phrase ‘face mask(s)’ or ‘face covering(s)’ and that were published in a selection of national and Scottish newspapers in the period between 01.12.2019 and 31.08.2020.

Table 2 provides the numbers of texts and words included in each of the resulting two corpora: the Scottish Corpus and the National Corpus. For the National Corpus, we also provide figures for articles extracted from ‘broadsheet’ vs. ‘tabloid’ newspapers, constituting the Broadsheet and Tabloid subcorpora. (NB: For the national newspapers specifically, we selected the national editions only, thus excluding the Irish, Scottish and Northern Ireland editions.). Figures 1 to 4 below show the number of articles per newspaper title within each corpus.

CorpusNumber of textsNumber of Words
National corpus11,53619,401,316
 The Broadsheet subcorpus6,63116,657,194
 The Tabloid subcorpus2,4191,264,952
Scottish corpus1,084588,894
Table 2: Number of texts and total number of words comprising each corpus
Figure 1: Number of texts from each national title

Figure 2: Number of texts from each broadsheet title
Figure 3: Number of texts from each tabloid title

Figure 4: Number of texts from each Scottish newspaper title

The Broadsheet subcorpus is by far the largest of all datasets, both in terms of the number of texts and the number of words (Table 2). Within that subcorpus, The Guardian and The Observer account for the highest number of articles, corresponding to 36% of texts and 83% of the words in that subcorpus (13,744,333 out of 16,657,194). The number of texts is more evenly distributed in the Tabloid subcorpus (Figure 3). The Daily Mail accounts for the largest number of texts (20%) but it is closely followed by The Express, The Sun and Evening Standard (17% and 15% each respectively). Within the Scottish corpus, most texts come from The Daily Record and The National (32% each).

5 Method

To answer question 1.a, we plotted the frequencies of the search terms used to collect the texts that comprise the corpora, ‘face mask(s)’ and ‘face covering(s)’. These figures give us an indication of how the level of attention fluctuated in the National and Scottish press throughout time.

To answer question 1.b, we carried out a ‘keyword’ analysis of the Scottish Corpus as compared with the National Corpus as a whole. Keywords are words that are much more frequent in a corpus of interest (known as the ‘study’ corpus) than they are in another corpus (known as the ‘reference corpus’), where the difference is statistically significant. They can be interpreted as reflecting the most distinctive concepts and themes in a particular corpus. The analysis was carried out using WordSmith Tools, version 7.

For the calculation of keywords, we established that the candidate keyword should occur in at least 5% of texts in the study corpus. This thus determined the minimum frequency of each term, which varied from one corpus to another. The minimum frequency was 577 instances in the National Corpus and 54 in the Scottish Corpus. In terms of statistical tests, we combined the log-likelihood test (a statistical measure of confidence) with log-ratio as the effect size measure, using the following threshold: a critical value higher than 15.13 (p < 0.001) for the log-likelihood test and 1.5 as the minimum log-ratio score, discarding negative scores. Keywords were then grouped by theme through close reading of the concordance lines, that is, individual occurrences of each word with the preceding and following stretches of text.

To answer question 1.c, we carried out a ‘collocation’ analysis of the terms ‘face mask(s)’ and ‘face covering(s)’. Collocation analyses explore co-occurrence relationships between words, and therefore make it possible to study the narratives or discourses that a word is part of. A word collocates with another if it is more likely to be found in close proximity to the other word than elsewhere. Collocations were generated by means of the software package LancsBox, on the basis of the criteria below:

  • Span of 5:5 – a window of five words to the left and five words to the right of the search word.
  • Mutual Information (MI) score ≥ 6. MI is a statistical procedure widely employed in corpus studies to indicate how strong the association between two words is. It is calculated by considering their frequency of co-occurrence in relation to their frequencies when occurring independently in each corpus.
  • Minimum frequency of collocation: 10 occurrences per 1,000 instances of term in question. For example, ‘face mask(s)’ occurs 1,672 times in the Welsh corpus; the minimum frequency of collocation was therefore 17 instances.

Similar to the analysis of keywords, collocations were analysed by close reading of their concordance lines.

6 Findings

Finding 1 – Over time, ‘face covering(s)’ became more frequent than ‘face mask(s)’ in the Scottish press, but not in national press.

Figures 5-6 show the frequency distribution of the terms ‘face mask(s)’ and ‘face covering(s)’ in the two corpora across time, considering the relative frequencies of terms (per 100,000 words). Note that the scale varies from one chart to another; that is due to differences in the amount of data from each corpus.


Figure 5: Relative frequencies of ‘face covering(s)’ and ‘face mask(s)’ in the National Corpus

Figure 6: Relative frequencies of ‘face covering(s)’ and ‘face mask(s)’ in the Scottish Corpus

As can be seen, both corpora show a clear preference for the term ‘face mask(s)’ in the early months, from December 2019 to March 2020, with hardly any mention of the term ‘face covering(s)’. Scottish newspapers seem to have embraced the term first, with mentions of ‘face covering(s)’ increasing swiftly in April 2020, corresponding to nearly half of the number of mentions of ‘face mask(s)’ in that month (83 as compared with 181 instances). National newspapers showed a modest increase in the mentions of ‘face covering(s)’ in April; the term ‘face mask(s)’ was nearly six times more frequent than ‘face covering(s)’ in the national newspapers (2,241 in relation to 386 instances). Mentions of ‘face covering(s)’ continued to rise across both corpora in the following months. In May, they represented about half of the number of mentions of ‘face mask(s)’ in the Scottish corpus and about a third in the National Corpus. By June, mentions of ‘face covering(s)’ surpassed those of ‘face mask(s)’ in Scottish newspapers. In national newspapers, ‘face mask(s)’ remained more frequent than ‘face covering(s)’ across the entire period.

Finding 2 – ‘Face covering(s)’ are mentioned much more often, relatively speaking, in the Scottish press than in the national press, alongside other terms for public indoor environments where they may be worn.

The words ‘covering’ and ‘coverings’, which tend to occur in the phrase ‘face covering(s)’, were found to be ‘key’ or ‘overused’ in the Scottish as compared with the National Corpus. In other words, ‘covering’ and ‘coverings’ are used much more often, in terms of relative frequencies, in the Scottish Corpus than in the National Corpus, based on our thresholds for effect size (log-ratio) and statistical significance (log-likelihood). However, based on the same thresholds, the word ‘mask(s)’ is not overused in the National corpus as compared with the Scottish Corpus. This means that ‘covering(s)’ in the Scottish Corpus is not in complementary distribution to ‘mask(s)’ in the National Corpus.

Overall, the keyword calculation retrieved 41 overused items in the Scottish Corpus, using the National corpus as reference in both. Table 3 includes the complete lists of keywords in the Scottish Corpus, grouped thematically and then ordered by their frequency of occurrence in the corpus.

Table 3 shows that the keywords in the Scottish Corpus include three other terms that are related to face coverings (‘mandatory’, ‘worn’ and ‘mouth’) as well as groups of words that relate to the different environments where face coverings may or may not be recommended or mandatory: Space (e.g. ‘indoor’, ‘outdoor’, ‘household’), Retail/hospitality (e.g. ‘shop’, ‘hospitality’) and Education (e.g. ‘pupils’, ‘teachers’).

Table 3: ‘Keywords’ in the Scottish Corpus, grouped by theme

The overuse of the word ‘kids’ reflects discussions about the age at which face masks/coverings should be made compulsory, as expressed by a reader’s comment published by The Glasgow Evening Times (Extract 1):

(1) “I AM so confused myself. Our kids are going with no distancing and in shops and malls and cinemas and public transport and airports. There is this hype of distancing. Which one is right? Are the poor kids so strong that they will not catch it at all and will not bring anything back home to their elderly grans etc? So illogical!” (The Glasgow Evening Times, 21.08.2020)

  • The keywords also include a group that is to do with Other Measures to reduce contagion, particularly in public spaces such as shops, restaurants and pubs (e.g. ‘screens’, ‘two-metre’). This is because face coverings are often presented as necessary when those other measures are not practicable:

(2) The government guidance says: “If you can, wear a face covering in an enclosed space where social distancing isn’t possible and where you will come into contact with people you do not normally meet. (The National, 25.06.2020).

Finding 3 – Face ‘mask(s)’ and ‘covering(s)’ have partly different collocates, reflecting differences in status and associated narratives.

We now examine the collocates of ‘face mask(s)’ and ‘face covering(s)’ in the two corpora. These are listed in Tables 4 and 5, in decreasing order of frequency of co-occurrence with each term.


Table 4: Collocations of ‘face mask(s)’ and ‘face covering(s)’ in the Scottish Corpus

Table 5: Collocations of ‘face mask(s)’ and ‘face covering(s)’ in the National Corpus

Five words appeared as collocates of both ‘face mask(s)’ and ‘face covering(s)’ in both corpora. These are: three different forms of the verb ‘wear’ (‘wear’, ‘wearing’, ‘worn’), ‘compulsory’ and ‘mandatory’. These suggest that ‘mask(s)’ and ‘covering(s)’ are both used in the context of debates and decisions about the need or obligation to wear them in certain settings.

Figure 7: Instances of ‘face mask(s)’ in the Scottish Corpus

However, the collocates that only apply to ‘face mask(s)’ show that they tend to be talked about as a type of PPE in clinical or care settings (e.g. ‘protective’, ‘surgical’, ‘gloves’, ‘aprons’).

(3) Carers, many of whom are paid low wages by private sector firms, have complained they have not been provided with essential items such as hand sanitiser, gloves, aprons, and face masks. (The Independent, 24.03.2020)

In contrast, the collocates that only apply to ‘covering(s)’ show that they tend to be talked about as a non-medical item of clothing that is:

  • made of cloth and a potential fashion accessory or political statement (‘cloth’, ‘branded’);

(4) Currently no other party is selling branded face coverings, although many independent online shops stock masks with Union flag or political designs. (The National, 25.07.2020)

(5) Face coverings include scarves, a piece of cloth or a mask and certain travellers – such as people with disabilities or breathing difficulties – will be exempt. (The Daily Express, 06.06.2020)

  • recommended to be worn (e.g. ‘recommended’, ‘advised’);

(6) Earlier this week, First Minister Nicola Sturgeon recommended the limited use of face coverings – not necessarily masks – when social distancing is hard to maintain. (Glasgow Evening Times, 04.05.2020)

(7) Other precautions advised include wearing face coverings in public as much as possible, keeping two metres apart, avoiding physical contact with those outside one’s household and to be tested and isolate if told to do so. (The Telegraph 18.07.2020)

  • in specific indoor public settings (‘crowded’, ‘enclosed’; ‘shops’, ‘transport’);

(8) “However, we are recommending you do wear a cloth face covering if you are in an enclosed space with others where social distancing is difficult – for example, on public transport, or in a shop.” (The National, 28.04.2020)

(9) It is compulsory to wear face coverings on public transport, in shops and when collecting takeaway food. (The Sun, 14.08.2020)

  • by large sections of the population (‘secondary’, ‘pupils’, ‘passengers’).

(10) A SECONDARY school is asking pupils to wear face coverings as part of efforts to combat the spread of coronavirus. (The Herald, 23.08.2020)

(11) Passengers have been told to wear face coverings on public transport to prevent a further outbreak of coronavirus as Britain slowly emerges from the lockdown. (The Times, 12.05.2020)

What does not, however, emerge from the collocates of ‘face covering(s)’ in either corpus is a consistent message about their role in protecting others from droplets produced by the wearer, thus reducing transmission overall. This may partly explain ongoing opposition to or scepticism about the usefulness of face coverings during the pandemic.

7 Conclusions

Overall, in the period December 2019 – August 2020, reports on face mask(s)/covering(s) in the Scottish press contrasted with the national press in terms of: a preference for ‘face covering(s)’ over ‘face mask(s)’ from April 2020 onwards; and a greater concern for their use to mitigate the transmission of the virus in schools, shops and other public indoor environments. This can only be partly explained by the fact that the Westminster government made decisions about the recommended/mandatory use of face coverings in public indoor spaces slightly later than the Scottish devolved administration. The contrasting collocates of ‘face covering(s)’ vs. ‘face mask(s)’ confirm that they are associated with different settings and narratives: PPE in clinical/care settings vs. item of clothing/accessory to be worn in public indoor environments by the general population as a public health measure. In the period under consideration, the latter narrative was therefore increasingly prevalent in Scottish but not in national newspapers.

Introductory Blog – Hanna Schmueck

I am very honoured to have received the Geoffrey Leech Outstanding MA Student Award for my MA in Language and Linguistics. This award traditionally goes to the MA student with the highest overall average.

I started my postgraduate journey in September 2019 after finishing my undergraduate degree at the University of Bamberg (Germany) in 2018 and working as a freelance translator and teacher for a year. I’ve always had an interest in the way language influences us both as individuals and as a society and have carried with me a fascination for experimentation and statistics. I first discovered corpus linguistics in the second year of my undergraduate degree, it soon after cemented itself as my primary research interest. I chose a corpus-based project for my undergraduate dissertation on pronouns in the English-lexifier lingua franca Bislama. From here I realised that much of the relevant methodological literature had been published by Lancaster academics – which cemented my decision to apply at Lancaster despite having to move abroad and face a number of Brexit-related administrative hurdles.

When I finally came to Lancaster for my MA, I felt welcome in the department from day one and I had the chance to attend/audit a wide variety of modules such as Cognitive Linguistics, Experimental Approaches to Language and Cognition, Forensic Linguistics, Stylistics, and Corpus Linguistics. The freedom of choice that Lancaster MA students in Language and Linguistics are given was another major motivation for studying at Lancaster and the flexible approach really benefited my personal learning experience. Another important element of my academic learning experience was being able to attend research groups – such as the Trinity group and UCREL talks –which focus on a wide variety of topics and allow you to come into contact with people that have all kinds of specialisms while getting the opportunity to develop your own research interests further.

I had, like all of us, not foreseen that my MA would move online in spring and all the challenges COVID-19 would bring about, but after the first phase of getting used to the situation I tried my best to see this as an opportunity to focus on my MA thesis titled “More than the sum of its parts: Collocation networks in the written section of the BNC2014 Baby+”. The aim of this thesis was to explore corpus-wide collocation networks and their structural and graph-theoretical properties using the BNC2014 Baby+ as the underlying dataset. I developed a method to create and display large MI2-score based weighted networks in order to analyse meta-level collocational patterns that emerge and performed a graph-theoretical analysis on them. The results obtained from this pilot study suggested that there is an underlying structure that all sections in the BNC2014 Baby+ share and the structure of the generated networks resembles other networks from a wide variety of phenomena such as power grids, social networks, and networks of brain neurons. The findings indicated that there are, however, text-type specific differences in terms of how connected different topic areas are and that certain words serve as hubs connecting topics with one another. The network displayed below is an example taken from the BNC Baby+ academic books section with a filter applied to only show the node “award”, its direct neighbours and their weighted interrelations.

I am very grateful for having had the opportunity to learn from and exchange ideas with so many amazing academics in the department over the course of my MA and I’m very excited to carry on researching collocation networks for my PhD here at Lancaster.

Representations of Obesity in the News: Project update and book announcement!

Gavin Brookes and Paul Baker

We are delighted to announce the forthcoming publication of a book based on research carried out as part of the CASS project, ‘Representations of Obesity in the News’. The book, titled Obesity in the News: Language and Representation in the Press, will be published by Cambridge University Press in 2021. You can see a sneak preview of the cover here!

The book reports analysis of a 36 million-word corpus of all UK national newspaper articles mentioning obese or obesity published over a ten-year period (2008-2017). This analysis combines methods from Corpus Linguistics with Critical Discourse Studies to explore the discourses that characterise press coverage of obesity during this period. The book explores a wide range of themes in this large dataset, with chapters that answer the following questions:

• What discourses characterise representations of obesity in the press as a whole?

• How do obesity discourses differ according to newspapers’ formats and political leanings?

• How have obesity discourses changed over time, and how do they interact with the annual news cycle?

• How does the press use language to shame and stigmatise people with obesity, and how are attempts to ‘reclaim’ the notion of obesity depicted?

• What discourses surround the core concepts of the ‘healthy body’, ‘diet’ and ‘exercise’ in press coverage of obesity?

• How do obesity discourses interact with gender, and how does this influence the ways in which men and women with obesity are represented?

• How does the press talk about social class in relation to obesity, and how do such discourses contribute to differing depictions of obesity in people from different social class groups?

• Finally, how do audiences respond to press depictions of obesity in below-the-line comments on online articles?

The book will be the latest output from this project. You can read more about our work on changing representations of obesity over time in this recent Open Access article published in Social Science & Medicine. We are also working on articles which expand our analysis of obesity and social class, depictions of obesity risk, and obesity discourses in press coverage of the coronavirus pandemic, so keep your eyes peeled for further announcements!