Celebrating the Written BNC2014: Lancaster Castle event

On 19 November 2021, The ESRC Centre for Corpus Approaches to Social Science (CASS) organised an event to celebrate the launch of the Written British National Corpus 2014 (BNC2024). The event was live-streamed from a very special location: the medieval Lancaster Castle.  There were about 20 participants on the site and more than 1,200 participants joined the event online.  Dr Vaclav Brezina started the event and welcomed the participants from over 30 different countries. After the official welcome by Professor Elena Semino and Professor Paul Connolly, a series of invited talks were delivered by prominent speakers from the UK and abroad. The talks covered topics such as corpus development, corpora in the classroom, corpora and fiction and the historical development of English.

The BNC2014 is now available together with its predecessor the BNC1994 via #LancBox X.

#LancsBox X interface
#LancsBox X interface

More information about the design and development of the Written BNC2014 is available from this open access research article:

If you missed the event, we offer the recording of the individual sessions below. You can also view the pdf slides about the Written BNC2014.

Online programme: Lancaster Corpus Linguistics
Vaclav Brezina, Elena Semino, Paul Connolly  (Lancaster University): Welcome and Introduction to the event
Tony McEnery (Lancaster University): The idea of the written BNC2014
Dawn Knight (Cardiff University): Building a National Corpus:  The story of the National Corpus of Contemporary Welsh
Vaclav Brezina and William Platt (Lancaster University): Current British English  and Exploring the BNC2014 using #LancsBox X
Randi Reppen (Northern Arizona University): Corpora in the classroom
Alice Deignan (University of Leeds): Corpora in education
Dana Gablasova (Lancaster University): Corpus for schools
Bas Aarts (University College London): Plonker of a politician NPs
Marc Alexander (University of Glasgow): British English: A historical perspective
Michaela Mahlberg (University of Birmingham): Corpora and fiction
Martin Wynne (University of Oxford): CLARIN – corpora, corpus tools and collaboration
Vaclav Brezina Farewell

Meet our students: Masters in Corpus Linguistics

Lancaster University is very proud to offer MA and Postgraduate Certificate programmes in Corpus linguistics. The programmes aim to equip students with skills that will enable them to analyse large amounts of linguistic data (corpora) using cutting-edge computational technology.

We asked our future students a few questions about their interests and motivation to study at Lancaster.

Alexandra Terashima: “Applying for this program represents a major pivot in my life.”

Hello! My name is Alexandra Terashima and I’ve recently been accepted into the Corpus Linguistics (Distance) MA program. I am originally from Russia, but  I grew up and studied in the United States, and currently, I am living in Japan.

I feel incredibly grateful to have been selected to receive a bursary to support my studies towards an MA in Corpus Linguistics.

This image has an empty alt attribute; its file name is image.png

Can you tell us a little bit about your background and research interests?

Applying for this program represents a major pivot in my life—I already have a PhD in genetics and worked for several years as a researcher in a lab. But something was missing for me and a few years ago,  I stepped away from the bench and turned towards the communication side of science, spending a few years helping scientists edit and revise papers for publication, which led to my current position, teaching academic writing to English language learners. 

My research interests include language acquisition, in particular how learners of English acquire knowledge of formulaic language, such as collocations and multi-word phrases, particularly ones that are used in specific genres of writing, such as scientific literature.

Why have you applied to study MA in Corpus linguistics at Lancaster University?

While, perhaps I am not a traditional MA program student, I applied to this program after careful consideration of my future career goals. During my time as a biology researcher, I was fascinated by the fact that, while scientific articles play a big role in the career of a scientist, the conventions of how to write scientific articles are not taught to science students at either the undergraduate or graduate level.  Instead, students are expected to learn how to write from their supervisor and other lab members. 

When I worked as an in-house editor at a research institute, I saw first hand how the quality of writing can influence an editor’s response to and reviewers’ comments on a submitted manuscript, regardless of the quality of the scientific findings. Through  working closely with scientists to help them improve their papers for publication, I became interested in education, and five years ago started working at the University of Tokyo, teaching academic writing to undergraduate students. Also 5 years ago I was introduced to corpus linguistics at an English for Specific Purposes conference where I heard talks by Laurence Anthony and Paul Thompson. The methodology of systematic analysis of language for patterns appealed to me and I began exploring this area of research in the context of my teaching. My career goal is to have a position in academia that combines teaching, research and supervision of graduate students, but I feel that I need additional qualifications to achieve my goals. I have been contemplating an MA in applied linguistics for several years as a way to acquire research training and qualifications in this field. In parallel, I became aware of Lancaster University as one of the leaders in the field of corpus linguistics by reading literature and taking part in the Corpus Linguistics MOOC on FutureLearn. Last fall, when I saw the announcement for this new distance MA program in corpus linguistics, I knew it was time to apply! 

Can you tell us a little bit about the topic you have selected for your MA dissertation?

Because of my strong interest in formulaic language, the topic of my MA dissertation focuses on the use of corpus analysis tools to measure and visualize phraseological development in spoken L2 English. In particular I will explore whether different levels of L2 proficiency can be distinguished by differences in the knowledge of collocations and if so, what statistical measures for identifying collocations are most effective. This project will utilize the Trinity Lancaster Corpus, which in addition to being the largest spoken learner corpus of its kind, is rich in metadata, which allows users to quickly access the data of interest, such as the samples from different levels of L2 proficiency. I will also need to learn my way around #LancsBox for this project, which no doubt will be an invaluable tool in my future research.

 Why have you selected this topic?

As a lifelong language learner, I am fascinated by how people acquire language, are taught language and ultimately, how they use language. I believe formulaic language, namely collocations and collocation networks, is one of the cornerstones of language study that can help improve learner motivation and accelerate the understanding of an L2 language.

I selected this topic because I am intrigued by the challenge of distinguishing collocational knowledge at different levels of L2 proficiency. I recognize the importance of such distinctions for developing assessment tools and graded teaching materials. It is also reasonable to assume that learners acquire L2 proficiency in different ways and so defining the borders between different levels of L2 English proficiency in terms of collocation knowledge is a challenging and useful endeavor, one that goes a step beyond vocabulary and grammar knowledge assessment.

What are your plans for the future?

For my future research, I would like to focus on formulaic language, specifically language used in scientific papers. I would like to help establish conventions to teach science paper writing systematically to undergraduate and graduate students to bridge the gap for scientists struggling to publish due to the poor writing skills of their supervisor or due to being a non-native English speaker. The majority of current literature analyzing scientific papers have been understandably done by linguists. While these studies provide many useful insights, I feel that their lack of understanding of scientific research culture as well as the culture of scientific publishing doesn’t allow them to fully capture the dynamic and evolving nature of the language of scientific publications. I believe that my background as a scientist can help bridge this gap and help expand this genre of linguistic research. 

Lee Daniels: Corpus linguistics at Lancaster is “a fantastic opportunity for me!”

Hi there! My name is Lee Daniels, and I am a bursary holder for the Corpus Linguistics MA at Lancaster University.

I am a 28-year-old North Yorkshireman turned Mancunian, who has lived in Salford for the past seven years. I have just completed my B.A. (Hons) Linguistics undergraduate degree with Manchester Metropolitan University, and I am incredibly excited for this fantastic opportunity with Lancaster University!

So! Let me tell you a little bit about myself in the form of a mini-interview format.

Can you tell us a little bit about your background and research interests?

I began my higher education relatively late, that is, it was not until the age of 25 that I entered Manchester Metropolitan University (MMU) as a mature student studying Linguistics and Italian. Prior to this I was working as a Third-Party Liability and Credit-Hire Motor Claims Handler. However, for a multitude of reasons, I decided that this career path was not for me and I wanted to dedicate my efforts to something where my passions lay. That passion was (and still is) any and all things Linguistics! Subsequently, I studied, paid for, and completed the qualifications needed (iGCSE and A-Level Italian) to gain entry into university and develop these passions further.

Through three fantastic years of study at MMU, I honed these passions into particular research interests, that is, via the sub-disciplines of cognitive linguistics, pragmatics (with a dash of semantics) and corpus linguistics (go figure!). Particularly, my interests lay in the combination of these three interests. For as I argue in my undergraduate dissertation research, isolating language conceptualisation from the real-world context through which it is found, is counter intuitive. Thus, in-line with an emerging socio-cognitive sub-discipline, my interests lay in intertwining conceptual and pragmatic processes which may influence unique language conceptualisations, and thus, language output.

I have found that the application of the corpus linguistic methodology, with its ever-developing capabilities thanks to ever-emerging new technology, provides fantastic opportunity to offer some substantiation or refutation to such claims (although I hope the former!). Nevertheless, the integration of these interests is something that I have initiated in my dissertation project and is something that I would love to continue to pursue throughout my academic career.

Why have you applied to study MA in Corpus linguistics at Lancaster University?

Lancaster has not only one of the best Linguistics departments in the world, but also, the corpus work coming out of the institution is at the cutting edge of the discipline. During my time at MMU, I often utilised the corpus work of Lancaster scholars to demonstrate the benefits and applicability of its methodology be it through Baker, Brezina, McEnery, Hardie, Semino, Culpeper (and many more). I had thus quickly learned of Lancaster’s position at the forefront of the field.

I have also had the pleasure of working with some of Lancaster alumni, such as Professor Dawn Archer and Dr Sean Murphy in a corpus-led research project looking at Shakespeare’s representation of gender in his works. This was via the utilisation of the Enhanced Shakespearean Corpus (ESC) and CQPWeb (developed at Lancaster). Additionally, I enjoy a fantastic and productive working relationship with Dr Lexi Webster, which I hope will continue for many years and produce exciting work. Nonetheless, I applied to Lancaster because I want to contribute to, and be associated with, the incredible work and people that are associated with the institution.

Can you tell us a little bit about the topic you have selected for your MA dissertation?

I have selected to study disagreement strategies in spoken L2 English (English not as a native language). This study will utilise the Trinity Lancaster Corpus (TLC) developed at The ESRC Centre for Corpus Approaches to Social Science (CASS), Lancaster University in collaboration with Trinity College London. TLC contains the largest body of spoken L2 English across all corpora and is thus best placed for the application of this MA dissertation piece. The topic selected allows the analysis of a complex pragmatic process (disagreement) through empirical means, whilst at the same time, complementing it with in-depth qualitative analysis. The subsequent findings obtained from this analysis may then enhance our understanding of second language pragmatic abilities, communicative strategies in language testing, and may thus contribute to greater understanding and improved practice within TESOL/TEFL contexts.

Why have you selected this topic?

What drew me in to this topic was the opportunity to provide great insight into a pragmatic communicative strategy; it also allows me to explore my research interests. That is, the project allows me to further explore the conceptual/contextual practices that are behind pragmatic strategy constructions.

Using corpus to provide substantiation to such a complex pragmatic phenomenon, also falls in line with my interests. In that, I think we are in an exciting time for Linguistics because the technology associated with corpus is only getting better and more capable. Thus, with that expansion, all sorts of new research may be attempted into complex phenomena (like L2 English disagreement strategies!) that was previously not feasible. Therefore to be at an institution that fully resonates this thinking is a fantastic opportunity for me!

What are your plans for the future?

More Linguistics! In other words, my aim is to become a Lecturer within the field. In addition to having a passion for the Linguistic discipline, I also love rambling on about it too! (if you have not guessed already). I developed this at MMU by applying it in a teaching capacity in both paid and voluntary roles. Nevertheless, I find teaching a topic that I am genuinely passionate about, and trying to stir that same passions in others, to be incredibly rewarding. Subsequently, to reach this goal I need to acquire my PhD and would love this to be at Lancaster via a similar corpus-led opportunity. Nevertheless, it will require a lot of hard work, but I am as committed now as I was on day one when I started this incredibly rewarding journey!

Launch of new project – Questioning Vaccination Discourse (Quo VaDis): A Corpus-based study

A new ESRC-funded project based in CASS will apply the methods of corpus linguistics to arrive at new understandings of vaccine hesitancy, which the World Health Organization lists among the top 10 global health challenges, and defines as ‘a delay in acceptance or refusal of vaccines despite availability of vaccination services’.

Vaccine hesitancy is often a consequence of views and attitudes that are formed and exchanged through discourse, for example by reading the news, listening to politicians and interacting on social media. The ‘Quo VaDis’ project (Questioning Vaccination Discourse) will employs corpus linguistic methods to study systematically the ways in which vaccinations are discussed, both currently and historically, in the UK press, UK parliamentary debates, and social media (Twitter, reddit and Mumsnet). The goal is to arrive at a better understanding of pro- and anti-vaccination views, as well as undecided views, and to use the findings to inform future public health campaigns about vaccinations, in collaboration with public health agencies. For more information: https://www.lancaster.ac.uk/vaccination-discourse/ Twitter: @vaccine_project   

William Dance – Introductory Blog

My name is William Dance and I’m one of two new Senior Research Associates in CASS.

I’m currently finishing my PhD in the linguistics department here and my main research interests are corpus approaches to deception and manipulation, using methods like (critical) discourse analysis to study online disinformation (better known as ‘fake news’).

I’m working alongside Tara Coltman-Patel on the new ESRC-funded ‘Questioning Vaccination Discourse’ Project (or Quo VaDis – Latin for ‘Where are you going?’). Alongside collaborators from Public Health England, UCL, and University of Leeds, the project looks at how the public, press, and policymakers speak and write about vaccinations both online and offline. The goal of the project (which believe it or not was submitted before the COVID-19 pandemic!) is to get a better understanding of how pro- and anti-vaccination views spread online, as well as how the vaccine uncertain people in the middle express their views.

I’ve found myself over the last few years researching topics just as they seem to gain global attention. I started researching disinformation during my Masters just as Donald Trump was elected president and “fake news” become a hot topic. Similarly, I joined the Quo VaDis just as a global pandemic began and vaccination became more important than ever before.

My research into disinformation has given me some amazing opportunities over the past few years. I’ve had the fortune to do things like present my research to parliamentarians, second to Whitehall for three months, and work with over 50 news organisations and state broadcasters to disseminate my research and help inform the public about online deception. This kind of external engagement is a theme throughout all of my work and I always try to reach out to communities outside of academia whenever I can. I also run a blog which you can find here.

Disinformation is a wide-reaching topic and my research on this has mainly focused on areas such as social media users’ motivations for sharing disinformation, analysing hostile-state information operations (HSIOs), with future publications focusing on exploring algorithmic disinformation and the spread of online disinformation.

Outside of work, one of my favourite hobbies is baking. This is something I do most evenings and weekends as I enjoy planning and writing recipes, and then baking things for friends and family (although I enjoy the washing up a lot less…). I’ve been baking and cooking pretty much since I could walk as I was taught to cook from a young age. You can see some of my creations here but my favourite thing to bake is bread.

I think the best way to end this introduction is just to say how much I’m looking forward to what the Quo VaDis project, and working in CASS in general, has to offer. I’m grateful to be working in the one of the best corpus research centres in the world and I can’t wait to see what the next three years brings.

Tara Coltman-Patel – Introductory Blog

This image has an empty alt attribute; its file name is image-1024x1024.png

My name is Tara Coltman-Patel and I am so excited to be a new member of CASS.

I am working as one of the Senior Research Associates on the ESRC-funded Quo VaDis project: Questioning Vaccination Discourse: A Corpus-Based Study project, which explores discussions about vaccinations in UK parliamentary debates, UK national newspapers and on the social media sites, Twitter, Reddit and Mumsnet. Using a variety of corpus tools and techniques, we will aim to gain a better understanding of the wide spectrum of pro-, anti- and undecided views surrounding vaccinations. Analysing how vaccinations are discussed across a variety of contexts, how the different views are communicated, and how people with different views interact, particularly on social media, will be an invaluable tool for addressing vaccine hesitancy. With our results we aim to inform, facilitate and help design future public health campaigns about vaccinations. As vaccinations are a salient topic, especially given the time we are currently living through, I am extremely grateful to have the opportunity to work on this research.

Before joining CASS I was working at Nottingham Trent University, where I recently finished my PhD which focussed on weight stigma and the representation of obesity in the British Press. In doing so I explored how metaphors can sensationalise and dehumanise people with obesity, I explored how science is recontextualised and misrepresented, and I explored the linguistic strategies of representation used in personal stories about weight loss. I am currently in the process of turning that research into a book titled ‘(Mis)Representing Obesity in the Press: Fear, Divisiveness, Shame and Stigma’, which will hopefully be published towards the end of 2022. Weight discrimination is a topic I am incredibly passionate about and in addition to research I have also worked as an anti-weight discrimination advocate and have consulted on global campaigns with the World Obesity Federation.

Outside of research I am a massive book worm and I love to read, I’m obsessed with RuPaul’s Drag Race and I’m also a sucker for a nice beer garden. Before Covid I loved to travel and have backpacked around Australia, Thailand, The Philippines, Mauritius and South Africa. I have some amazing and memorable moments from those trips, from bad ones like falling off a (small) cliff in Mauritius and being bitten on my hand by a spider in Australia, to incredible ones like canyoneering in The Philippines and swimming with sharks in Australia and South Africa. Sharks are my favourite animal and I have a plethora of fun facts about them ready to share at any given moment, so you definitely won’t regret inviting me to parties …

To conclude, I’m really thrilled to be a part of CASS and the Quo VaDis project, and as I have run out of interesting things to say about myself, I’ll end this blog post here.

#LancsBox: The emerging historical linguist’s MO? A brief case study of Aramaic.

By: Charbel El-Khaissi

I took Lancaster University’s free Corpus Linguistics course (Corpus MOOC) to fill time. Three months later, a doctoral research proposal enabled by #LancsBox, a software tool introduced in the course, was accepted at the Australian National University.

For as long as this topic has been studied, ancient Semitic languages have relied on classical philological approaches. Naturally, a tension exists between this tradition and contemporary approaches in computational linguistics. It would be unfair to characterise this divide as a mere consequence of ‘old-school’ scholars resisting technological changes in research because philology is an inherent part of the study. The study of any ancient language requires far more human involvement than a machine can achieve: a careful hand to conserve and restore manuscripts, a keen eye for epigraphic analysis and a well-rounded, learned mind to interpret literature in medias res, politically, theologically and societally. However, as far as the researcher is open to computer-assistive technology, #LancsBox fills a much-needed gap in historical linguistics, especially in the field of Semitic historical syntax.

As a case in point, consider the Aramaic language: the longest, continuously spoken Semitic language with an attested lifespan of approximately 3,000 years. This human language offers linguists intriguing insights on how human languages change over a substantial time period, including changes in its underlying structure (i.e., grammar and syntax). If these changes are substantiated then their insights may lend important cues concerning the evolution of human cognition itself. Yet, the historical syntax of Aramaic remains largely underrepresented and understudied. Few commendable scholars have undertaken the task of analysing developments in areas of Aramaic grammar (e.g., Huehnergard, 2005; Rubin, 2005; Grassi, 2009; Pat-El, 2012; Coghill, 2012). Among other reasons, the lack of rigorous study in this discipline is due to the labour-intensive task of qualitatively analysing large corpora. This task is made more difficult by a manual transcription and grammatical tagging process, in addition to administration duties such as record management and categorisation. Recent advancements in Aramaic computational linguistics – including, but not limited to Handwriting-text Recognition (HTR) technology and digital archives – have significantly reduced time of text transcription and tagging. However, the diachronic analysis of large corpora remains tedious without a free, user-friendly and accessible corpus software like #LancsBox.

My doctoral research is among the first studies in Semitic historical linguistics to experiment with Lancaster University’s #LancsBox corpus software and analyse Aramaic syntax over time. Thus far, it has proven to be an exceptional tool for data management and diachronic analysis (see Figure 1 and Figure 2):

• Corpus management: the ease of creating, storing and analysing (sub-)corpora based on variables of interest (e.g., by dialect, century, author) reduces administrative overhead and gives me more time test different hypotheses according to multiple variables.

• POS-tagging: in addition to offering POS tagging in a number of languages, #LancsBox caters to self-tagged corpora. This means I can import datasets that have been annotated according to my own tagging scheme, which gives me flexibility when testing the robustness of tag sets according to various theoretical frameworks.

As with any computer software, few caveats are worthy of mention to historical Semitic linguists interested in using the software for their research.

• Coding: basic knowledge of Regular Expression coding is needed to execute meaningful, in-context searches.

• Font: in its current version (5.0), Aramaic is partially-supported, with some fonts appearing disconnected. This makes in-tool legibility difficult, but not impossible.

• Text-direction: in its current version (5.0), Aramaic texts appear reversed (e.g., “cat” appears “tac”). Current workarounds include (1) using free, online tools to reverse the text prior to import, or (2) conducting analysis outside the tool.

Will #LancsBox become the MO for future historical linguists? Only time will tell. It seems to me the only accessible software currently available for linguists who wish to build and design their own corpus, especially in underrepresented and under resourced languages. In fact, I can think of a number of innovative applications outside the research domain as well: for example, Australian linguists might be able to use #LancsBox to investigate which linguistic features have been declining in student writing over the last decade. Perhaps then #LancsBox’s core functionalities could help academics in other fields and a wider group of users.

Watch a 60-second video of Charbel El-Khaissi’s research here.

Acknowledgements: Thank you Professor Tony McEnery, Dr Pierre Weill-Tessier and Dr Vaclav Brezina whose innovations have enabled my research. I express gratitude to my supervisory panel for their ongoing guidance.

British Muslims Caught Amidst FOGs – A Discourse Analysis of Religious Advice and Authority

By Usman Maravia

In this blog entry, I will provide an overview of my latest article which explores the writing style of Islamic advice texts on COVID-19. The issues that were addressed in these advice texts were related to the topic of mosque closures, funerary rites, fasting during Ramadan, and suspending Friday and daily prayers to help curb the spread of COVID-19. These texts were being circulated in the UK in March and April of 2020, a crucial period wherein information was passed on to address issues that, in the scope of the study, British Muslims would face in Ramadan, which began on 25th April 2020.

The context

My interest in this topic was sparked by an unfortunate COVID-19 related death of an elderly Muslim from Walsall. A family member of the deceased stated in the Press that “It is imperative that we learn from this tragic loss and comply with Government guidelines to save lives”. What further caught my interest was that if the aim of the Islamic advice documents was to help Muslims stay safe during the pandemic, a unified and standardised message with collaboration between Muslim faith leaders and health professionals would have been helpful. Instead, a range of documents were found to be circulated as well as these documents differed in their titles – leading to ambiguity of exactly what preventative British Muslims were to take and where exactly lay the authority.

Moreover , the titles of these documents differed. Some were titled fatwa, which is a non-binding legal opinion of an Islamic legal expert, but still a document that could potentially carry much influence on Muslim communities in the UK. Some documents were written by healthcare professionals and were titled guidance documents – I wondered, do these documents carry the same weight as fatwas? And yet other documents were neither titled a fatwa nor guidance but in a hybrid style of the two categories, again I wondered, why were these words used in the titles?

The FOG corpus

As such, I sought to identify a) the underlying reasons behind the titling of the documents; and (b) the construction of discourses in the documents. In collaboration with my colleagues Zhazira Bekzhanova (Astana IT University, Kazakhstan), Mansur Ali (Centre for the Study of Islam in the UK, Cardiff University), and Rakan Alibri (University of Tabuk), we collected a total of 76 texts that were available online on websites of British mosques, Facebook pages and other online venues. We found that of these 76 documents, 14 documents were clearly titled fatwa. We also found that six documents were titled guidance documents, and an eye-catching 56 documents, which we refer to as other documents, included a range of words in their titles such as analysis, clarification, confirmation, guidelines, method, pathway, permissibility, plan of action, points, recommendation, response, ruling, and statement. This classification led to our jocular acronym FOG i.e., fatwas, other documents, and guidance documents. This compilation then led to the creation of the specialised FOG corpus consisting of around 110,000 words.

We examined these written electronic texts in the social context of Muslims and COVID-19 in the UK. We explored the way language was used in real-life in fatwas, guidance documents, and other documents. We then focused on the way the authors of these documents differ in their writing styles to create a certain impression on the audience by increasing, in Bourdieu’s terms, symbolic capital. Moreover, we focus on representation of social actors (van Leeuwen, 1995) in deciphering power relations across the FOG documents. Moreover, references to social actors are widely analysed and interpreted across the FOG documents. Other than text producers of these documents, the audience’s references are also analysed, explained, and interpreted through the prism of authorities.

Corpus methods

We applied corpus-assisted critical discourse analysis, which helped us to uncover important patterns in relation to FOGs. Using AntConc software, we analysed the frequency of words, word lists, lexical bundles, collocations, concordance plots, and concordances to detect linguistic patterns in the FOG corpus. Corpus methods also assisted us with the tools to detect power hierarchies and inequalities within the texts. Moreover, our corpus-assisted study strengthens Brookes and McEnery’s study, that texts do acquire symbolic capital through an accumulation of patterns of textual cohesion and rhetorical strategies. We found that the documents appear to follow an underlying hierarchy among British Muslim scholars.

Findings

To elaborate, a particular writing style can be found across the FOG documents. We found fatwas and guidance documents to be textually diametric, whereas other documents were found to feature greater intertextuality as well as maintaining respect to the authority of muftis and their fatwas, but with reservations. The fatwas were found to be written by senior muftis and contained important references to the Qur’an and Muhammad, the Prophet of Islam. Fatwas also included legal terminology in Arabic related to Shariah law. Moreover, fatwas contained phrases such as ‘according to’ and ‘Allah knows best’.

Such a writing style is in accordance with the traditional writing style of fatwas and thereby holds higher symbolic capital. On the other hand, guidance documents were produced by healthcare professionals and did not contain such theologically related phrases but rather relied on scientific and medical language. Interestingly, we found the other category of documents to be written in a hybrid-style of fatwas and guidance documents. Such a writing style appears to increase the symbolic capital of these documents as well as it empowers the writers to challenge existing fatwas – whilst maintaining respect for senior muftis.

While the FOG documents reveal that multiple voices are welcome in addressing a national emergency, we recommend that a standardisation of documents, issued in collaboration with the NHS and senior muftis, could perhaps give a clearer action plan for British Muslims in future. As such, this study is intended to give an impetus to social scientists to explore the discourse of British Muslims and COVID-19 through a linguistic lens.

Our article is available to read in MDPI’s open access journal Religion. Additionally, further research is being carried out on the topic of COVID-19 by the British Islamic Medical Association’s (BIMA) as part of ‘Operation Vaccination’.

For my article on addressing vaccine resistance from an Islamic perspective, please read Vaccines: religio-cultural arguments from an Islamic perspective published by JBIMA.

‘Face masks’ and ‘face coverings’ in the UK press during the Covid-19 pandemic: Scottish vs. national newspapers

Carmen Dayrell, Isobelle Clarke and Elena Semino (Lancaster University)

1 Introduction

Since the beginning of the Covid-19 pandemic, the use of face masks or face coverings as a means of reducing the transmission of the virus has been a major area of debate in many countries around the world. In the UK specifically, the first nine months of 2020 saw a rapid change from a view of face masks as a medical piece of PPE that would not be appropriate or acceptable for the general population, to the establishment of non-surgical face coverings as a recommended public health measure in indoor public spaces, such as buses and supermarkets. As with other aspects of the response to the pandemic, during that time there were differences in the approach to face masks/coverings between the Scottish devolved administration and the Westminster government.

Table 1 provides a timeline summary of policy decisions concerning face masks/coverings on public transport, shops and schools in Scotland and England. For the most part, in Scotland face coverings were recommended or made mandatory earlier than in England. They are also mandatory in corridors and communal areas in Scottish schools, whereas in England this is at the school’s discretion.

 Public transportShopsSchools
April(28th) Scotland (recommended)(28th) Scotland (recommended) 
May(11th) England (recommended)(11th) England (recommended) 
June(15th)England (mandatory) (22nd) Scotland (mandatory)  
July (10th) Scotland (mandatory) (24th) England (mandatory) 
August  (31st) Scotland (mandatory in corridors and communal areas)
September  (1st) England (school/college discretion in indoors communal areas)
Table 1 – Timeline of policy decisions about the wearing of face coverings by the general public in Scotland vs. England.

Scotland has also had a lower incidence of Covid-19 than England. According to official UK government data, as of 30th December 23 people per 1,000 had had at least one positive Covid-19 test in Scotland, in contrast with 39 people per 1,000 in England.

This blog post is concerned with references to face masks and face coverings in Scottish vs. national UK newspapers between December 2019 and August 2020, that is from the start of reports about a new type of pneumonia in Wuhan, China, up to the beginning of the 2020-21 school year in the UK.

2 Research questions

Overarching research question

How does press reporting on face masks and face coverings in Scotland compare with national UK reporting between December 2019 and August 2020?

Specific research questions

  1. How did the frequency of use of ‘face covering(s)’ vs. ‘face mask(s)’ change over time in Scottish vs. national press reporting?
  2. Were there any statistically significant differences in the relative frequencies of the use of ‘face mask(s)’ and ‘face covering(s)’, and of terms relating to places where face masks/coverings may be used, in Scottish vs. national press reporting?
  3. What are the differences and similarities in the collocations (co-occurrence of words) of ‘face mask(s)’ vs. ‘face covering(s)’ in Scottish and national press reporting?

3 Findings in brief

Finding 1 – Over time, ‘face covering(s)’ became more frequent than ‘face mask(s)’ in the Scottish press, but not in the national press.

Finding 2 – ‘Face covering(s)’ are mentioned much more often, relatively speaking, in the Scottish press than in the national press, alongside other terms for public indoor environments where they may be worn.

Finding 3 – Face ‘mask(s)’ and ‘covering(s)’ have partly different collocates, reflecting differences in status and associated narratives.

4 Data

The news aggregator service LexisNexis was used to collect articles that contained either the phrase ‘face mask(s)’ or ‘face covering(s)’ and that were published in a selection of national and Scottish newspapers in the period between 01.12.2019 and 31.08.2020.

Table 2 provides the numbers of texts and words included in each of the resulting two corpora: the Scottish Corpus and the National Corpus. For the National Corpus, we also provide figures for articles extracted from ‘broadsheet’ vs. ‘tabloid’ newspapers, constituting the Broadsheet and Tabloid subcorpora. (NB: For the national newspapers specifically, we selected the national editions only, thus excluding the Irish, Scottish and Northern Ireland editions.). Figures 1 to 4 below show the number of articles per newspaper title within each corpus.

CorpusNumber of textsNumber of Words
National corpus11,53619,401,316
 The Broadsheet subcorpus6,63116,657,194
 The Tabloid subcorpus2,4191,264,952
Scottish corpus1,084588,894
Table 2: Number of texts and total number of words comprising each corpus
Figure 1: Number of texts from each national title

Figure 2: Number of texts from each broadsheet title
Figure 3: Number of texts from each tabloid title

Figure 4: Number of texts from each Scottish newspaper title

The Broadsheet subcorpus is by far the largest of all datasets, both in terms of the number of texts and the number of words (Table 2). Within that subcorpus, The Guardian and The Observer account for the highest number of articles, corresponding to 36% of texts and 83% of the words in that subcorpus (13,744,333 out of 16,657,194). The number of texts is more evenly distributed in the Tabloid subcorpus (Figure 3). The Daily Mail accounts for the largest number of texts (20%) but it is closely followed by The Express, The Sun and Evening Standard (17% and 15% each respectively). Within the Scottish corpus, most texts come from The Daily Record and The National (32% each).

5 Method

To answer question 1.a, we plotted the frequencies of the search terms used to collect the texts that comprise the corpora, ‘face mask(s)’ and ‘face covering(s)’. These figures give us an indication of how the level of attention fluctuated in the National and Scottish press throughout time.

To answer question 1.b, we carried out a ‘keyword’ analysis of the Scottish Corpus as compared with the National Corpus as a whole. Keywords are words that are much more frequent in a corpus of interest (known as the ‘study’ corpus) than they are in another corpus (known as the ‘reference corpus’), where the difference is statistically significant. They can be interpreted as reflecting the most distinctive concepts and themes in a particular corpus. The analysis was carried out using WordSmith Tools, version 7.

For the calculation of keywords, we established that the candidate keyword should occur in at least 5% of texts in the study corpus. This thus determined the minimum frequency of each term, which varied from one corpus to another. The minimum frequency was 577 instances in the National Corpus and 54 in the Scottish Corpus. In terms of statistical tests, we combined the log-likelihood test (a statistical measure of confidence) with log-ratio as the effect size measure, using the following threshold: a critical value higher than 15.13 (p < 0.001) for the log-likelihood test and 1.5 as the minimum log-ratio score, discarding negative scores. Keywords were then grouped by theme through close reading of the concordance lines, that is, individual occurrences of each word with the preceding and following stretches of text.

To answer question 1.c, we carried out a ‘collocation’ analysis of the terms ‘face mask(s)’ and ‘face covering(s)’. Collocation analyses explore co-occurrence relationships between words, and therefore make it possible to study the narratives or discourses that a word is part of. A word collocates with another if it is more likely to be found in close proximity to the other word than elsewhere. Collocations were generated by means of the software package LancsBox, on the basis of the criteria below:

  • Span of 5:5 – a window of five words to the left and five words to the right of the search word.
  • Mutual Information (MI) score ≥ 6. MI is a statistical procedure widely employed in corpus studies to indicate how strong the association between two words is. It is calculated by considering their frequency of co-occurrence in relation to their frequencies when occurring independently in each corpus.
  • Minimum frequency of collocation: 10 occurrences per 1,000 instances of term in question. For example, ‘face mask(s)’ occurs 1,672 times in the Welsh corpus; the minimum frequency of collocation was therefore 17 instances.

Similar to the analysis of keywords, collocations were analysed by close reading of their concordance lines.

6 Findings

Finding 1 – Over time, ‘face covering(s)’ became more frequent than ‘face mask(s)’ in the Scottish press, but not in national press.

Figures 5-6 show the frequency distribution of the terms ‘face mask(s)’ and ‘face covering(s)’ in the two corpora across time, considering the relative frequencies of terms (per 100,000 words). Note that the scale varies from one chart to another; that is due to differences in the amount of data from each corpus.


Figure 5: Relative frequencies of ‘face covering(s)’ and ‘face mask(s)’ in the National Corpus

Figure 6: Relative frequencies of ‘face covering(s)’ and ‘face mask(s)’ in the Scottish Corpus

As can be seen, both corpora show a clear preference for the term ‘face mask(s)’ in the early months, from December 2019 to March 2020, with hardly any mention of the term ‘face covering(s)’. Scottish newspapers seem to have embraced the term first, with mentions of ‘face covering(s)’ increasing swiftly in April 2020, corresponding to nearly half of the number of mentions of ‘face mask(s)’ in that month (83 as compared with 181 instances). National newspapers showed a modest increase in the mentions of ‘face covering(s)’ in April; the term ‘face mask(s)’ was nearly six times more frequent than ‘face covering(s)’ in the national newspapers (2,241 in relation to 386 instances). Mentions of ‘face covering(s)’ continued to rise across both corpora in the following months. In May, they represented about half of the number of mentions of ‘face mask(s)’ in the Scottish corpus and about a third in the National Corpus. By June, mentions of ‘face covering(s)’ surpassed those of ‘face mask(s)’ in Scottish newspapers. In national newspapers, ‘face mask(s)’ remained more frequent than ‘face covering(s)’ across the entire period.

Finding 2 – ‘Face covering(s)’ are mentioned much more often, relatively speaking, in the Scottish press than in the national press, alongside other terms for public indoor environments where they may be worn.

The words ‘covering’ and ‘coverings’, which tend to occur in the phrase ‘face covering(s)’, were found to be ‘key’ or ‘overused’ in the Scottish as compared with the National Corpus. In other words, ‘covering’ and ‘coverings’ are used much more often, in terms of relative frequencies, in the Scottish Corpus than in the National Corpus, based on our thresholds for effect size (log-ratio) and statistical significance (log-likelihood). However, based on the same thresholds, the word ‘mask(s)’ is not overused in the National corpus as compared with the Scottish Corpus. This means that ‘covering(s)’ in the Scottish Corpus is not in complementary distribution to ‘mask(s)’ in the National Corpus.

Overall, the keyword calculation retrieved 41 overused items in the Scottish Corpus, using the National corpus as reference in both. Table 3 includes the complete lists of keywords in the Scottish Corpus, grouped thematically and then ordered by their frequency of occurrence in the corpus.

Table 3 shows that the keywords in the Scottish Corpus include three other terms that are related to face coverings (‘mandatory’, ‘worn’ and ‘mouth’) as well as groups of words that relate to the different environments where face coverings may or may not be recommended or mandatory: Space (e.g. ‘indoor’, ‘outdoor’, ‘household’), Retail/hospitality (e.g. ‘shop’, ‘hospitality’) and Education (e.g. ‘pupils’, ‘teachers’).

Table 3: ‘Keywords’ in the Scottish Corpus, grouped by theme

The overuse of the word ‘kids’ reflects discussions about the age at which face masks/coverings should be made compulsory, as expressed by a reader’s comment published by The Glasgow Evening Times (Extract 1):

(1) “I AM so confused myself. Our kids are going with no distancing and in shops and malls and cinemas and public transport and airports. There is this hype of distancing. Which one is right? Are the poor kids so strong that they will not catch it at all and will not bring anything back home to their elderly grans etc? So illogical!” (The Glasgow Evening Times, 21.08.2020)

  • The keywords also include a group that is to do with Other Measures to reduce contagion, particularly in public spaces such as shops, restaurants and pubs (e.g. ‘screens’, ‘two-metre’). This is because face coverings are often presented as necessary when those other measures are not practicable:

(2) The government guidance says: “If you can, wear a face covering in an enclosed space where social distancing isn’t possible and where you will come into contact with people you do not normally meet. (The National, 25.06.2020).

Finding 3 – Face ‘mask(s)’ and ‘covering(s)’ have partly different collocates, reflecting differences in status and associated narratives.

We now examine the collocates of ‘face mask(s)’ and ‘face covering(s)’ in the two corpora. These are listed in Tables 4 and 5, in decreasing order of frequency of co-occurrence with each term.


Table 4: Collocations of ‘face mask(s)’ and ‘face covering(s)’ in the Scottish Corpus

Table 5: Collocations of ‘face mask(s)’ and ‘face covering(s)’ in the National Corpus

Five words appeared as collocates of both ‘face mask(s)’ and ‘face covering(s)’ in both corpora. These are: three different forms of the verb ‘wear’ (‘wear’, ‘wearing’, ‘worn’), ‘compulsory’ and ‘mandatory’. These suggest that ‘mask(s)’ and ‘covering(s)’ are both used in the context of debates and decisions about the need or obligation to wear them in certain settings.

Figure 7: Instances of ‘face mask(s)’ in the Scottish Corpus

However, the collocates that only apply to ‘face mask(s)’ show that they tend to be talked about as a type of PPE in clinical or care settings (e.g. ‘protective’, ‘surgical’, ‘gloves’, ‘aprons’).

(3) Carers, many of whom are paid low wages by private sector firms, have complained they have not been provided with essential items such as hand sanitiser, gloves, aprons, and face masks. (The Independent, 24.03.2020)

In contrast, the collocates that only apply to ‘covering(s)’ show that they tend to be talked about as a non-medical item of clothing that is:

  • made of cloth and a potential fashion accessory or political statement (‘cloth’, ‘branded’);

(4) Currently no other party is selling branded face coverings, although many independent online shops stock masks with Union flag or political designs. (The National, 25.07.2020)

(5) Face coverings include scarves, a piece of cloth or a mask and certain travellers – such as people with disabilities or breathing difficulties – will be exempt. (The Daily Express, 06.06.2020)

  • recommended to be worn (e.g. ‘recommended’, ‘advised’);

(6) Earlier this week, First Minister Nicola Sturgeon recommended the limited use of face coverings – not necessarily masks – when social distancing is hard to maintain. (Glasgow Evening Times, 04.05.2020)

(7) Other precautions advised include wearing face coverings in public as much as possible, keeping two metres apart, avoiding physical contact with those outside one’s household and to be tested and isolate if told to do so. (The Telegraph 18.07.2020)

  • in specific indoor public settings (‘crowded’, ‘enclosed’; ‘shops’, ‘transport’);

(8) “However, we are recommending you do wear a cloth face covering if you are in an enclosed space with others where social distancing is difficult – for example, on public transport, or in a shop.” (The National, 28.04.2020)

(9) It is compulsory to wear face coverings on public transport, in shops and when collecting takeaway food. (The Sun, 14.08.2020)

  • by large sections of the population (‘secondary’, ‘pupils’, ‘passengers’).

(10) A SECONDARY school is asking pupils to wear face coverings as part of efforts to combat the spread of coronavirus. (The Herald, 23.08.2020)

(11) Passengers have been told to wear face coverings on public transport to prevent a further outbreak of coronavirus as Britain slowly emerges from the lockdown. (The Times, 12.05.2020)

What does not, however, emerge from the collocates of ‘face covering(s)’ in either corpus is a consistent message about their role in protecting others from droplets produced by the wearer, thus reducing transmission overall. This may partly explain ongoing opposition to or scepticism about the usefulness of face coverings during the pandemic.

7 Conclusions

Overall, in the period December 2019 – August 2020, reports on face mask(s)/covering(s) in the Scottish press contrasted with the national press in terms of: a preference for ‘face covering(s)’ over ‘face mask(s)’ from April 2020 onwards; and a greater concern for their use to mitigate the transmission of the virus in schools, shops and other public indoor environments. This can only be partly explained by the fact that the Westminster government made decisions about the recommended/mandatory use of face coverings in public indoor spaces slightly later than the Scottish devolved administration. The contrasting collocates of ‘face covering(s)’ vs. ‘face mask(s)’ confirm that they are associated with different settings and narratives: PPE in clinical/care settings vs. item of clothing/accessory to be worn in public indoor environments by the general population as a public health measure. In the period under consideration, the latter narrative was therefore increasingly prevalent in Scottish but not in national newspapers.

Introductory Blog – Hanna Schmueck

I am very honoured to have received the Geoffrey Leech Outstanding MA Student Award for my MA in Language and Linguistics. This award traditionally goes to the MA student with the highest overall average.

I started my postgraduate journey in September 2019 after finishing my undergraduate degree at the University of Bamberg (Germany) in 2018 and working as a freelance translator and teacher for a year. I’ve always had an interest in the way language influences us both as individuals and as a society and have carried with me a fascination for experimentation and statistics. I first discovered corpus linguistics in the second year of my undergraduate degree, it soon after cemented itself as my primary research interest. I chose a corpus-based project for my undergraduate dissertation on pronouns in the English-lexifier lingua franca Bislama. From here I realised that much of the relevant methodological literature had been published by Lancaster academics – which cemented my decision to apply at Lancaster despite having to move abroad and face a number of Brexit-related administrative hurdles.

When I finally came to Lancaster for my MA, I felt welcome in the department from day one and I had the chance to attend/audit a wide variety of modules such as Cognitive Linguistics, Experimental Approaches to Language and Cognition, Forensic Linguistics, Stylistics, and Corpus Linguistics. The freedom of choice that Lancaster MA students in Language and Linguistics are given was another major motivation for studying at Lancaster and the flexible approach really benefited my personal learning experience. Another important element of my academic learning experience was being able to attend research groups – such as the Trinity group and UCREL talks –which focus on a wide variety of topics and allow you to come into contact with people that have all kinds of specialisms while getting the opportunity to develop your own research interests further.

I had, like all of us, not foreseen that my MA would move online in spring and all the challenges COVID-19 would bring about, but after the first phase of getting used to the situation I tried my best to see this as an opportunity to focus on my MA thesis titled “More than the sum of its parts: Collocation networks in the written section of the BNC2014 Baby+”. The aim of this thesis was to explore corpus-wide collocation networks and their structural and graph-theoretical properties using the BNC2014 Baby+ as the underlying dataset. I developed a method to create and display large MI2-score based weighted networks in order to analyse meta-level collocational patterns that emerge and performed a graph-theoretical analysis on them. The results obtained from this pilot study suggested that there is an underlying structure that all sections in the BNC2014 Baby+ share and the structure of the generated networks resembles other networks from a wide variety of phenomena such as power grids, social networks, and networks of brain neurons. The findings indicated that there are, however, text-type specific differences in terms of how connected different topic areas are and that certain words serve as hubs connecting topics with one another. The network displayed below is an example taken from the BNC Baby+ academic books section with a filter applied to only show the node “award”, its direct neighbours and their weighted interrelations.

I am very grateful for having had the opportunity to learn from and exchange ideas with so many amazing academics in the department over the course of my MA and I’m very excited to carry on researching collocation networks for my PhD here at Lancaster.

ICR Outstanding Corpus Thesis Award for Lancaster PhD graduate

I am honoured to have received the Institute for Corpus Research Outstanding Doctoral Thesis Award. The purpose of this annual award is to recognise and reward theses in the field of Corpus Linguistics.

I conducted my PhD research in the Centre for Corpus Approaches to Social Science at Lancaster University, which is part of Department of Linguistics and English Language. My thesis was titled Collocational Processing in Typologically Different Languages, English and Turkish: Evidence from Corpora and Psycholinguistic Experimentation. Some of the findings based on my PhD research are reported in this article. The study was multidisciplinary, involving both corpus analysis and psycholinguistic experimentation. Supervisors Dr Vaclav Brezina and Prof Patrick Rebuschat played a key role in shaping the thesis. Their academic knowledge and insight have been invaluable in developing a multidisciplinary perspective to pioneer a contrastive study of English and Turkish.

Turkish, with its rich morphology, differs from English – prompting questions about whether the same variables affect collocational processing in the two languages. Importantly, so far the vast majority of research on collocational processing has focussed on a narrow range of primarily European languages, especially English, which makes it difficult to generalise the findings to other languages. Corpus analyses showed that uninflected collocations have similar mean frequencies and association counts in both languages. When inflected forms were included, 75% of the Turkish collocations occurred at a higher frequency than the collocations in English, suggesting that language typology impacts frequency of collocations.

I then conducted psycholinguistic experiments to understand the differences and similarities between the processing of collocations in English and Turkish and by native and non-native English speakers. To what extent is there a difference between native-speakers’ (of English and Turkish) sensitivity to both individual word-level and phrase level frequency information when processing collocations? Mixed-effects regression modelling revealed that Turkish and English native-speakers are equally sensitive to collocation frequencies, confirming collocations’ psychological reality in both languages. Yet English speakers were additionally affected by individual word-frequencies, indicating that language typologies require users to process collocations from different sources of information.

Furthermore, this thesis investigated the effects of individual word and collocational frequency on native and non-native speakers’ collocational processing in English. Both groups of participants demonstrated sensitivity to individual word and collocation frequency. The findings align with the predictions of usage-based approaches that language acquisition should be viewed as a statistical accumulation of experiences that changes every time we encounter a particular utterance.

This study identified both universal fundamentals and language-specific differences in collocational processing. It addressed language typology and second-language learning through a novel multidisciplinary approach which reinforces and challenges usage-based theories of language learning, demonstrating that they should include typologically different languages to develop broader perspectives on processing.

Please see the link here for more information about this award.

If you have any questions, or are interested in working with me, get in touch. Dr Doğuş Can Öksüz Research fellow at the University of Leeds. d.oksuz@leeds.ac.uk