Representation of the Sea in the UK Press: Public Awareness of the Oceans

Carmen Dayrell, Basil Germond (Lancaster University) and Celine Germond-Duret (Liverpool John Moores University)

  1. Introduction

5th November 2021 was COP26’s Ocean Action Day. The UK Presidency of the conference stressed the need “to take ambitious steps towards ocean health and resilience” in order to contribute to “our fight against climate change”. Ocean sustainability is contingent to citizens’ awareness of “the benefits they receive from the marine environment” (DEFRA, 2021, p.4). However, the sea is at the bottom of the list when it comes to public perception of global environmental issues (Potts et al., 2016).

This study examines the representation of the sea in the UK press, focussing on articles published in 2020 and analysing both national (broadsheet and tabloids) and regional newspapers (see section 3 for details). The goal is to unravel the way the sea is represented in the British written press, which is the third source of information about the marine environment (DEFRA, 2021, p.23). More specifically, we seek to explore: (i) the extent to which the sea is represented in purely technical, economic and opportunistic terms as opposed to emotional and identity terms; and (ii) how the media representation of the sea can inform our understanding of citizens’ connection to the sea.

  • Findings in brief

Finding 1:        The narrative frequently represents the sea in terms of economic opportunities: resources, profit and job creation.

Finding 2:        The ‘marine environment’ (and not the ‘sea’ itself) is represented as a natural resource that must be preserved, protected, especially in view of sustaining the economic benefits from the sea.

Finding 3:        Newspapers frequently stress the negative impacts of climate change on the sea.

Finding 4:        An emotional lexicon can only be found in relation to aesthetic considerations.

Finding 5:        A sense of place can be found in relation to the seashore/coastal locations.

  • Data

The analysis is drawn from three separate datasets, or ‘corpora’.  The National Corpus, divided into The National Broadsheet and The National Tabloid Subcorpora, and The Regional Corpus. Table 1 provides the number of texts and words included in each corpus.

CorpusNumber of textsNumber of Words
National corpus28,40622,076,726
National Broadsheet subcorpus 20,09917,651,577
National Tabloid subcorpus 8,3074,425,169
Regional corpus11,5456,700,126

Table 1: Number of texts and total number of words comprising each corpus

Figures 1 to 3 show the selection of individual newspaper titles within each dataset, with the overall number of articles per newspaper title.


Figure 1: Number of texts from each national broadsheet newspaper

Figure 2: Number of texts from each national tabloid newspaper
Figure 3: Number of texts from each regional newspaper

Articles were published between 1/1/2020 and 31/12/2020. All texts were collected from a news aggregator service (LexisNexis), considering the printed version of the newspapers in their weekday and Sunday versions. The collection of individual texts proceeded on the basis of specific terms: we searched for articles that contained either ‘sea(s)’ or ‘ocean(s)’ and considered any type of articles in which those words appeared. This means that, in addition to news reports, the corpora include other types of texts such as editorials and letters to the editor.

For the national newspapers specifically, we selected the national editions only, thus excluding the Irish, Scottish and Northern Ireland editions. Duplicates were removed from all corpora. However, for The Regional Corpus specifically, we kept articles that published across different newspapers (usually part of the same media group) given that they would reach different audiences.

  1. Methods

We used analytical techniques associated with the field of Corpus Linguistics to study the dominant narratives in the national and regional newspapers, and within each considering all newspaper titles in aggregate. The corpora were processed using the software WordSmith Tools version 7 and the software package LancsBox.

To provide an overview of the most distinctive linguistic characteristics of each corpus, we carried out ‘keyword’ analyses. Keywords are words that are more frequent in a corpus of interest (known as the ‘study’ corpus) than they are in another corpus (known as the ‘reference corpus’), where the difference is statistically significant. They can be interpreted as reflecting the most distinctive concepts and themes in a particular corpus.

We carried out three separate comparisons. We first generated the keywords in the National Corpus using the Baby+ edition of the British National Corpus 2014 (BNC2014-baby) as the reference corpus. We then carried out a similar procedure in the Regional Corpus. These procedures identified words that were salient in the National and Regional Corpora respectively in relation to a general corpus of British English. We then generated the keywords in the Regional Corpus using the National Corpus as the reference corpus so that we could identify words that were prominent in the Regional but not in the National Corpus.

For the calculation of keywords, we focused on words that occurred with a minimum frequency of 100 occurrences in one million words, in at least 1% of the total number of texts in the study corpus. This was to ensure that the analysis focused on words that were overall relatively frequent and occurred across various texts. In terms of statistical tests, we combined a statistical test of significance (the log-likelihood test) with an effect size measure (Log-Ratio). The log-likelihood test tells us to what extent differences in frequencies between the two corpora (the study and the reference corpus) is statistically significant. It was applied considering a critical value higher than 6.63 (p < 0.01). Log-Ratio measures how big the difference is. The higher the Log-Ratio score, the larger the difference. The log-ratio calculation therefore gives us the words whose frequencies are proportionally higher in the study corpus. For the analysis, we focused on the 20 keywords with the highest Log-Ratio score in each calculation.

Keywords were interpreted by examining their ‘collocations’ through close reading of their ‘concordance lines’. Collocation analyses explore co-occurrence relationships between words, and therefore makes it possible to study the narratives or discourses that a word is part of. Concordance lines refer to individual occurrences of each word with the preceding and following stretches of text. For The National Corpus, we examined the broadsheets and the tabloids separately so that we could determine similarities and differences in their narratives.

Collocations were generated on the basis of the following criteria:

  • Span of 5:5 – a window of five words to the left and five words to the right of the search word;
  • Mutual Information (MI) score ≥ 6. MI is a statistical procedure widely employed in corpus studies to indicate how strong the association between two words is. It is calculated by considering their frequency of co-occurrence in relation to their frequencies when occurring independently in each corpus.
  • Minimum frequency of collocation: 1% of the frequency in the study corpus.
  1. Findings

Tables 2 and 3 list the keywords in the National (broadsheet and tabloids altogether) and Regional Corpora respectively, in relation to the general language corpus, grouped by theme.

ThemesKeywords
Seasea, ocean, coast, islands, marine, beaches, seas, oceans
BrexitBrexit
Industryfishing
Means of transportationboats
Travellingflights
Measureslockdown, restrictions
Pandemiccovid, coronavirus, pandemic, virus
PoliticiansTrump, Boris

Table 2:‘Keywords’ in the National Corpus in relation to the BNC2014_Baby, organised by theme

ThemesKeywords
Geographical referencesPlymouth, Aberdeen, Norfolk, Cornwall, Hove
Seasea, marine
Means of transportationvessel, vessels
Distanceoffshore
Rescue servicecoastguard, lifeboat, RNLI
Placeharbour
Divisionchopped
Measureslockdown, distancing
Pandemiccovid, coronavirus, pandemic

Table 3:‘Keywords’ in the Regional Corpus in relation to the BNC2014_Baby, organised by theme

Table 4 provides the keywords in the Regional Corpus in relation to the National Corpus, organised by theme. These keywords highlight themes which occurred with a relatively higher frequency in the Regional Corpus as compared with the National Corpus.

ThemeKeywords
Governmentcouncil
Rescue servicecoastguard, lifeboat, RNLI
Placepier
Geographical referencesPlymouth, Aberdeen, Brighton, Suffolk, County, Norwich, Ipswich, Welsh, Southampton, Belfast, Hove, Durham

Table 4: Top 20 keywords in the Regional Corpus in relation to the National Corpus, organised by theme.

As discussed below, these keywords uncovered various ways in which the sea is represented in the UK press. Note the prominence of words referring to the Covid-19 pandemic and measures to control the spread of the virus (cf. the themes of Pandemic and Measures in Tables 2 and 3). This is not surprising since our study corpora are restricted to news texts published in the year of the pandemic (2020) while texts in the BNC2014-baby (that predates the Covid-19 pandemic) comprises a wide range of text genres and covers a longer period, from 2010 to 2017. This also explains the overuse of prominent politicians’ names (‘Boris’ and ‘Trump’) in the National Corpus (Table 2).

Finding 1: The narrative frequently represents the sea in terms of economic opportunities: resources, profit and job creation.

In both broadsheet and tabloid newspapers, prominence is given to negotiations around the fishing industry after Brexit (see Brexit and Industry in Table 2). Here the discourse revolves around the terms and conditions for European fleets to fish in UK waters (Extract 1), thus highlighting the relevance of the sea as an economic resource and the importance of protecting one’s economic rights in the current political climate. In addition to collocating with ‘Brexit’ in both subcorpora, the keyword ‘fishing’ collocates with types of sea transport (‘boat(s)’, ‘fleet(s)’, ‘vessel(s)’, and ‘trawler’), fishing gears (‘gear’ and ‘nets’) as well as words indicating some kind of restrictions (such as ‘quotas’, ‘illegal’, ‘rights’, ‘access’). There are also mentions of how the Brexit deal would affect fishing ‘villages’ and ‘communities’ in the UK and EU countries.

  • But the EU has been clear that the price of access to its markets must be access for its fishing fleets to British waters (The Times, 30/01/2020).
  • The row over fishing rights following Britain’s EU departure still threatens to collapse trade and security discussions after another week of wrangling ended in deadlock (The Sun, 22/11/2020).

References to economic resources are also seen through the collocations of ‘sea’ with ‘North’ across the three corpora (see Figures 4-6), which point towards mentions of oil production in the North Sea.


Figure 4: Collocates of ‘sea’ in the National Broadsheet Corpus

Figure 5: Collocates of ‘sea’ in the National Tabloid Corpus

Figure 6: Collocates of ‘sea’ in the Regional Corpus

While national newspapers frequently mention the oil industry’s revenues, regional newspapers report on the discussions on the North Sea Transition Deal. The North Sea is also mentioned in relation to measures to curb the spread of Covid-19 on UK platforms and avoid job losses (Extract 4).

  • By the mid-80s, the North Sea was providing 10% of the Treasury’s revenues, enabling Conservative government tax cuts, covering the costs of unemployment, and paying off a historic balance of payments problem (The Herald, 14/06/2020).
  • THE safety of North Sea workers is at risk over oil industry plans to rip up a vital sector-wide agreement, a trade union has warned (Daily Record, 05/03/2020).

Interestingly, most mentions of the North Sea in regional newspapers (74%) come from the two newspapers based in Aberdeen (Aberdeen Press and Journal and Aberdeen Evening Express). References to renewable energy are prominent in the Regional Corpus only, as indicated by the keyword ‘offshore’ (see Table 3), which uncovered mentions of the generation of renewable energy through offshore wind farms (Extract 5). This demonstrates how the sea is considered as an important source of revenue (in particular via job creation) in communities that have traditionally depended on the sea for income generation. In contrast, national newspapers frequently mention mining of the deep sea for minerals (see the collocations of ‘sea’ with ‘deep’, Figures 4 and 5), especially in relation to campaigns to halt deep-sea mining given its serious environmental impacts (Extract 6).

  • A predicted boom in North Sea offshore wind jobs was branded “a pipe dream” by union bosses after the Scottish Government admitted it only uses “estimates” of current employment figures in the sector. (Aberdeen Press and Journal, 20/02/.2020)

Sir David Attenborough has backed calls to halt deep-sea mining for minerals that are in high demand for use in items such as mobile phones. (The Daily Telegraph, 13/03/2020)

The Irish Sea is also frequently mentioned in the three corpora (see Figures 4-6). Here the discussion revolves around the negotiations between the UK and the EU in the context of Brexit as it entailed a regulatory border between Great Britain and Northern Ireland for the crossing of goods (Extract 7-8). It is interesting to note that 62% of the mentions of ‘Irish Sea’ in the Regional Corpus came from the Belfast Telegraph.

  • Animal products arriving across the Irish Sea from Great Britain – including meat, milk, fish and eggs – will have to enter through a border control post where paperwork will be checked and a proportion of goods will be inspected. (The Daily Mail, 10/12/2020)
  • But no matter how gently this was presented, an Irish Sea border has become a living reality, courtesy of Boris Johnson. (Belfast Telegraph, 16/12/2020)

Finding 2: The ‘marine environment’ (and not the ‘sea’ itself) is represented as a natural resource that must be preserved, protected, especially in view of sustaining the economic benefits from the sea.

The sea is represented as a natural resource to be preserved. However, this is seen through the collocations of the keyword ‘marine’ (see Tables 2 and 3) rather than through the word ‘sea’ itself. ‘Marine’ collocates with words such as ‘protected’, ‘conservation’, ‘ecosystems’ and ‘environment’ across the three corpora (Figures 7-9), uncovering mentions of initiatives and campaigns to protect marine life (Extract 9).

  • Conservation charity Blue Marine Foundation called for one of the first pilot sites to be established in Wembury Bay, in the Plymouth Sound National Marine Park, to protect its varied marine life and habitats and to help connect people to the sea (Western Daily Press, 08/06/2020).

Figure 7: Collocates of ‘marine’ in the National Broadsheet Corpus

Figure 8: Collocates of ‘marine’ in the National Tabloid Corpus

Figure 9: Collocates of ‘marine’ in the Regional Corpus

Concern about preservation of the sea ecosystem is also seen through the analysis of the keywords ‘boats’ or ‘vessel(s)’ (cf. Means of Transportation in Tables 2 and 3). The reporting revolves around campaigns against supertrawlers fishing in UK waters, especially in protected areas, as they have a negative impact on fishing ‘villages’ and ‘communities’. In the National Broadsheet Corpus, this narrative is also seen through the collocations of ‘fishing’ with ‘sustainable’ and ‘protected’, which unveiled references to sustainable fishing (Extract 10).

  • Marine wildlife monitors say the vessels are destroying fish stocks, killing non-target species, harming sustainable fishing communities and destroying marine ecosystems (The Independent, 12/12/2020).

Finding 3: Newspapers frequently stressed the negative impacts of climate change on the sea. Climate change is a prominent theme in the discourse of both national and regional newspapers, as indicated by the association of ‘sea’ with ‘level(s)’ across the three corpora, and with ‘rising’, ‘rise’ and ‘ice’ in

the National Broadsheet Corpus (see Figures 4-6). The newspapers frequently mention the rising of sea levels due to higher global temperatures (Extract 11). The National Broadsheet Corpus specifically mentions declining of sea ice cover in the Artic Ocean due to climate change. This is evident through the collocations of the keyword ‘ocean’ with words such as ‘Arctic’, ‘temperatures’ and ‘warming’ (Extract 12).

  • Sea level rises have accelerated in recent decades, threatening coastal areas and low-lying land by 2100 (The Express, 16/05/2020).
  • Ocean temperatures in the area recently climbed to more than 5C above average, following a record breaking heatwave and the unusually early decline of last winter’s sea ice (The Guardian, 22/10/2020).

Finding 4: An ‘emotional’ lexicon can only be found in relation to aesthetic considerations.

The keyword ‘sea’ collocates with ‘view(s)’ across the three corpora (see Figures 4-6), uncovering descriptions of places with ‘panoramic’, ‘stunning’, ‘incredible’ or ‘superb’ sea views. This relates to value of a sea view in the hospitality sector (hotels, accommodations, restaurants, etc.) as well as in private properties (Extract 13).

  • The property is spread over five floors and has key features such as a built in library, spectacular formal reception room and stunning sea views (The Argus, 12/09/2020).

Finding 5: A sense of place can be found in relation to the seashore/coastal locations.

A clear distinctive feature of regional newspapers relates to the prominence of place names (cf. Geographical References, Table 3). This is interesting because these draw attention to places close to the seashore. Through the analysis of the keyword ‘harbour’, we uncovered mentions of seaside towns with historic walled harbours where boats line up. Another feature specific of the Regional Corpus relates to Rescue services (Table 3) which refers to the services provided by local councils or charities in case of emergencies to ensure people’s safety on the beach or to rescue animals and objects from the sea (Extract 14).

  • On average, the Redcar lifeboat is called out between 50 and 70 times a year, with the RNLI nationally being involved in nearly 10,000 emergencies in 2019. (The Northern Echo, 21/10/2020).

The set of keywords in Table 4 provides further evidence for the prominence of Geographical References and Rescue Services in the Regional Corpus. The occurrence of the keyword ‘council’ is not surprising given the composition of the corpus (local newspapers). There are several mentions of councils’ planning and services, including those to ensure safety in the seaside. The keyword ‘pier’ corroborates the trend indicated by the ‘harbour’ by highlighting people’s connection to the sea through sports and leisure activities in seaside towns and counties (Extract 15).

  • Along with the pier’s existing aquarium and new rollercoaster, a £4m investment has added an indoor and outdoor adventure golf course as well as a children’s soft play area, covering more than 30,000 sq ft (East Anglian Daily Times, 01/02/2020).

6. Conclusions and recommendations

The dominant narrative in the British written press represents the sea as an economic resource and, at the same time, as a ‘marine environment’ to be protected and preserved. The sea is also recurrently represented as in needs of more regulations (especially in the context of Brexit and the migration crisis). We found some ‘emotional’ words in reference to ‘sea views’, but the dominant narrative is clearly one of utilitarianism and opportunism: in other words, the sea must be protected as it is useful, and not so much because we have any sense of belonging and connection to the sea. This fits with a weak conception of sustainable development that prioritises economic needs over environmental preservation (while trying to find a balance between these two necessities). The representation of the sea in British media demonstrates that there is an awareness of the benefits of the sea for livelihood and a need for the marine environment to be protected. But what is lacking is the conveyance of a real sense of place and belonging.

This has important consequences regarding ocean awareness. Indeed, having citizens worrying about the sea because they understand that the sea is economically important for them is certainly a good beginning, but this remains within the frame of a weak sustainability approach. Ocean sustainability (i.e. recognising the utter importance of the environmental and social dimensions of oceans) requires a stronger emotional connection with the sea. The narrative around the sea is too much utilitarian/opportunistic and not emotional enough. This contributes to a lack of sense of belonging and the valuing of oceans for their sole economic importance. In sum, our findings show that public policy stakeholders which want to further develop ocean awareness among the wider public need to contribute to the promotion of a narrative about the sea that is not just utilitarian (revenue, job creation) but also emotional.

Celebrating the Written BNC2014: Lancaster Castle event

On 19 November 2021, The ESRC Centre for Corpus Approaches to Social Science (CASS) organised an event to celebrate the launch of the Written British National Corpus 2014 (BNC2024). The event was live-streamed from a very special location: the medieval Lancaster Castle.  There were about 20 participants on the site and more than 1,200 participants joined the event online.  Dr Vaclav Brezina started the event and welcomed the participants from over 30 different countries. After the official welcome by Professor Elena Semino and Professor Paul Connolly, a series of invited talks were delivered by prominent speakers from the UK and abroad. The talks covered topics such as corpus development, corpora in the classroom, corpora and fiction and the historical development of English.

The BNC2014 is now available together with its predecessor the BNC1994 via #LancBox X.

#LancsBox X interface
#LancsBox X interface

More information about the design and development of the Written BNC2014 is available from this open access research article:

If you missed the event, we offer the recording of the individual sessions below. You can also view the pdf slides about the Written BNC2014.

Online programme: Lancaster Corpus Linguistics
Vaclav Brezina, Elena Semino, Paul Connolly  (Lancaster University): Welcome and Introduction to the event
Tony McEnery (Lancaster University): The idea of the written BNC2014
Dawn Knight (Cardiff University): Building a National Corpus:  The story of the National Corpus of Contemporary Welsh
Vaclav Brezina and William Platt (Lancaster University): Current British English  and Exploring the BNC2014 using #LancsBox X
Randi Reppen (Northern Arizona University): Corpora in the classroom
Alice Deignan (University of Leeds): Corpora in education
Dana Gablasova (Lancaster University): Corpus for schools
Bas Aarts (University College London): Plonker of a politician NPs
Marc Alexander (University of Glasgow): British English: A historical perspective
Michaela Mahlberg (University of Birmingham): Corpora and fiction
Martin Wynne (University of Oxford): CLARIN – corpora, corpus tools and collaboration
Vaclav Brezina Farewell

Meet our students: Masters in Corpus Linguistics

Lancaster University is very proud to offer MA and Postgraduate Certificate programmes in Corpus linguistics. The programmes aim to equip students with skills that will enable them to analyse large amounts of linguistic data (corpora) using cutting-edge computational technology.

We asked our future students a few questions about their interests and motivation to study at Lancaster.

Alexandra Terashima: “Applying for this program represents a major pivot in my life.”

Hello! My name is Alexandra Terashima and I’ve recently been accepted into the Corpus Linguistics (Distance) MA program. I am originally from Russia, but  I grew up and studied in the United States, and currently, I am living in Japan.

I feel incredibly grateful to have been selected to receive a bursary to support my studies towards an MA in Corpus Linguistics.

This image has an empty alt attribute; its file name is image.png

Can you tell us a little bit about your background and research interests?

Applying for this program represents a major pivot in my life—I already have a PhD in genetics and worked for several years as a researcher in a lab. But something was missing for me and a few years ago,  I stepped away from the bench and turned towards the communication side of science, spending a few years helping scientists edit and revise papers for publication, which led to my current position, teaching academic writing to English language learners. 

My research interests include language acquisition, in particular how learners of English acquire knowledge of formulaic language, such as collocations and multi-word phrases, particularly ones that are used in specific genres of writing, such as scientific literature.

Why have you applied to study MA in Corpus linguistics at Lancaster University?

While, perhaps I am not a traditional MA program student, I applied to this program after careful consideration of my future career goals. During my time as a biology researcher, I was fascinated by the fact that, while scientific articles play a big role in the career of a scientist, the conventions of how to write scientific articles are not taught to science students at either the undergraduate or graduate level.  Instead, students are expected to learn how to write from their supervisor and other lab members. 

When I worked as an in-house editor at a research institute, I saw first hand how the quality of writing can influence an editor’s response to and reviewers’ comments on a submitted manuscript, regardless of the quality of the scientific findings. Through  working closely with scientists to help them improve their papers for publication, I became interested in education, and five years ago started working at the University of Tokyo, teaching academic writing to undergraduate students. Also 5 years ago I was introduced to corpus linguistics at an English for Specific Purposes conference where I heard talks by Laurence Anthony and Paul Thompson. The methodology of systematic analysis of language for patterns appealed to me and I began exploring this area of research in the context of my teaching. My career goal is to have a position in academia that combines teaching, research and supervision of graduate students, but I feel that I need additional qualifications to achieve my goals. I have been contemplating an MA in applied linguistics for several years as a way to acquire research training and qualifications in this field. In parallel, I became aware of Lancaster University as one of the leaders in the field of corpus linguistics by reading literature and taking part in the Corpus Linguistics MOOC on FutureLearn. Last fall, when I saw the announcement for this new distance MA program in corpus linguistics, I knew it was time to apply! 

Can you tell us a little bit about the topic you have selected for your MA dissertation?

Because of my strong interest in formulaic language, the topic of my MA dissertation focuses on the use of corpus analysis tools to measure and visualize phraseological development in spoken L2 English. In particular I will explore whether different levels of L2 proficiency can be distinguished by differences in the knowledge of collocations and if so, what statistical measures for identifying collocations are most effective. This project will utilize the Trinity Lancaster Corpus, which in addition to being the largest spoken learner corpus of its kind, is rich in metadata, which allows users to quickly access the data of interest, such as the samples from different levels of L2 proficiency. I will also need to learn my way around #LancsBox for this project, which no doubt will be an invaluable tool in my future research.

 Why have you selected this topic?

As a lifelong language learner, I am fascinated by how people acquire language, are taught language and ultimately, how they use language. I believe formulaic language, namely collocations and collocation networks, is one of the cornerstones of language study that can help improve learner motivation and accelerate the understanding of an L2 language.

I selected this topic because I am intrigued by the challenge of distinguishing collocational knowledge at different levels of L2 proficiency. I recognize the importance of such distinctions for developing assessment tools and graded teaching materials. It is also reasonable to assume that learners acquire L2 proficiency in different ways and so defining the borders between different levels of L2 English proficiency in terms of collocation knowledge is a challenging and useful endeavor, one that goes a step beyond vocabulary and grammar knowledge assessment.

What are your plans for the future?

For my future research, I would like to focus on formulaic language, specifically language used in scientific papers. I would like to help establish conventions to teach science paper writing systematically to undergraduate and graduate students to bridge the gap for scientists struggling to publish due to the poor writing skills of their supervisor or due to being a non-native English speaker. The majority of current literature analyzing scientific papers have been understandably done by linguists. While these studies provide many useful insights, I feel that their lack of understanding of scientific research culture as well as the culture of scientific publishing doesn’t allow them to fully capture the dynamic and evolving nature of the language of scientific publications. I believe that my background as a scientist can help bridge this gap and help expand this genre of linguistic research. 

Lee Daniels: Corpus linguistics at Lancaster is “a fantastic opportunity for me!”

Hi there! My name is Lee Daniels, and I am a bursary holder for the Corpus Linguistics MA at Lancaster University.

I am a 28-year-old North Yorkshireman turned Mancunian, who has lived in Salford for the past seven years. I have just completed my B.A. (Hons) Linguistics undergraduate degree with Manchester Metropolitan University, and I am incredibly excited for this fantastic opportunity with Lancaster University!

So! Let me tell you a little bit about myself in the form of a mini-interview format.

Can you tell us a little bit about your background and research interests?

I began my higher education relatively late, that is, it was not until the age of 25 that I entered Manchester Metropolitan University (MMU) as a mature student studying Linguistics and Italian. Prior to this I was working as a Third-Party Liability and Credit-Hire Motor Claims Handler. However, for a multitude of reasons, I decided that this career path was not for me and I wanted to dedicate my efforts to something where my passions lay. That passion was (and still is) any and all things Linguistics! Subsequently, I studied, paid for, and completed the qualifications needed (iGCSE and A-Level Italian) to gain entry into university and develop these passions further.

Through three fantastic years of study at MMU, I honed these passions into particular research interests, that is, via the sub-disciplines of cognitive linguistics, pragmatics (with a dash of semantics) and corpus linguistics (go figure!). Particularly, my interests lay in the combination of these three interests. For as I argue in my undergraduate dissertation research, isolating language conceptualisation from the real-world context through which it is found, is counter intuitive. Thus, in-line with an emerging socio-cognitive sub-discipline, my interests lay in intertwining conceptual and pragmatic processes which may influence unique language conceptualisations, and thus, language output.

I have found that the application of the corpus linguistic methodology, with its ever-developing capabilities thanks to ever-emerging new technology, provides fantastic opportunity to offer some substantiation or refutation to such claims (although I hope the former!). Nevertheless, the integration of these interests is something that I have initiated in my dissertation project and is something that I would love to continue to pursue throughout my academic career.

Why have you applied to study MA in Corpus linguistics at Lancaster University?

Lancaster has not only one of the best Linguistics departments in the world, but also, the corpus work coming out of the institution is at the cutting edge of the discipline. During my time at MMU, I often utilised the corpus work of Lancaster scholars to demonstrate the benefits and applicability of its methodology be it through Baker, Brezina, McEnery, Hardie, Semino, Culpeper (and many more). I had thus quickly learned of Lancaster’s position at the forefront of the field.

I have also had the pleasure of working with some of Lancaster alumni, such as Professor Dawn Archer and Dr Sean Murphy in a corpus-led research project looking at Shakespeare’s representation of gender in his works. This was via the utilisation of the Enhanced Shakespearean Corpus (ESC) and CQPWeb (developed at Lancaster). Additionally, I enjoy a fantastic and productive working relationship with Dr Lexi Webster, which I hope will continue for many years and produce exciting work. Nonetheless, I applied to Lancaster because I want to contribute to, and be associated with, the incredible work and people that are associated with the institution.

Can you tell us a little bit about the topic you have selected for your MA dissertation?

I have selected to study disagreement strategies in spoken L2 English (English not as a native language). This study will utilise the Trinity Lancaster Corpus (TLC) developed at The ESRC Centre for Corpus Approaches to Social Science (CASS), Lancaster University in collaboration with Trinity College London. TLC contains the largest body of spoken L2 English across all corpora and is thus best placed for the application of this MA dissertation piece. The topic selected allows the analysis of a complex pragmatic process (disagreement) through empirical means, whilst at the same time, complementing it with in-depth qualitative analysis. The subsequent findings obtained from this analysis may then enhance our understanding of second language pragmatic abilities, communicative strategies in language testing, and may thus contribute to greater understanding and improved practice within TESOL/TEFL contexts.

Why have you selected this topic?

What drew me in to this topic was the opportunity to provide great insight into a pragmatic communicative strategy; it also allows me to explore my research interests. That is, the project allows me to further explore the conceptual/contextual practices that are behind pragmatic strategy constructions.

Using corpus to provide substantiation to such a complex pragmatic phenomenon, also falls in line with my interests. In that, I think we are in an exciting time for Linguistics because the technology associated with corpus is only getting better and more capable. Thus, with that expansion, all sorts of new research may be attempted into complex phenomena (like L2 English disagreement strategies!) that was previously not feasible. Therefore to be at an institution that fully resonates this thinking is a fantastic opportunity for me!

What are your plans for the future?

More Linguistics! In other words, my aim is to become a Lecturer within the field. In addition to having a passion for the Linguistic discipline, I also love rambling on about it too! (if you have not guessed already). I developed this at MMU by applying it in a teaching capacity in both paid and voluntary roles. Nevertheless, I find teaching a topic that I am genuinely passionate about, and trying to stir that same passions in others, to be incredibly rewarding. Subsequently, to reach this goal I need to acquire my PhD and would love this to be at Lancaster via a similar corpus-led opportunity. Nevertheless, it will require a lot of hard work, but I am as committed now as I was on day one when I started this incredibly rewarding journey!

Launch of new project – Questioning Vaccination Discourse (Quo VaDis): A Corpus-based study

A new ESRC-funded project based in CASS will apply the methods of corpus linguistics to arrive at new understandings of vaccine hesitancy, which the World Health Organization lists among the top 10 global health challenges, and defines as ‘a delay in acceptance or refusal of vaccines despite availability of vaccination services’.

Vaccine hesitancy is often a consequence of views and attitudes that are formed and exchanged through discourse, for example by reading the news, listening to politicians and interacting on social media. The ‘Quo VaDis’ project (Questioning Vaccination Discourse) will employs corpus linguistic methods to study systematically the ways in which vaccinations are discussed, both currently and historically, in the UK press, UK parliamentary debates, and social media (Twitter, reddit and Mumsnet). The goal is to arrive at a better understanding of pro- and anti-vaccination views, as well as undecided views, and to use the findings to inform future public health campaigns about vaccinations, in collaboration with public health agencies. For more information: https://www.lancaster.ac.uk/vaccination-discourse/ Twitter: @vaccine_project   

William Dance – Introductory Blog

My name is William Dance and I’m one of two new Senior Research Associates in CASS.

I’m currently finishing my PhD in the linguistics department here and my main research interests are corpus approaches to deception and manipulation, using methods like (critical) discourse analysis to study online disinformation (better known as ‘fake news’).

I’m working alongside Tara Coltman-Patel on the new ESRC-funded ‘Questioning Vaccination Discourse’ Project (or Quo VaDis – Latin for ‘Where are you going?’). Alongside collaborators from Public Health England, UCL, and University of Leeds, the project looks at how the public, press, and policymakers speak and write about vaccinations both online and offline. The goal of the project (which believe it or not was submitted before the COVID-19 pandemic!) is to get a better understanding of how pro- and anti-vaccination views spread online, as well as how the vaccine uncertain people in the middle express their views.

I’ve found myself over the last few years researching topics just as they seem to gain global attention. I started researching disinformation during my Masters just as Donald Trump was elected president and “fake news” become a hot topic. Similarly, I joined the Quo VaDis just as a global pandemic began and vaccination became more important than ever before.

My research into disinformation has given me some amazing opportunities over the past few years. I’ve had the fortune to do things like present my research to parliamentarians, second to Whitehall for three months, and work with over 50 news organisations and state broadcasters to disseminate my research and help inform the public about online deception. This kind of external engagement is a theme throughout all of my work and I always try to reach out to communities outside of academia whenever I can. I also run a blog which you can find here.

Disinformation is a wide-reaching topic and my research on this has mainly focused on areas such as social media users’ motivations for sharing disinformation, analysing hostile-state information operations (HSIOs), with future publications focusing on exploring algorithmic disinformation and the spread of online disinformation.

Outside of work, one of my favourite hobbies is baking. This is something I do most evenings and weekends as I enjoy planning and writing recipes, and then baking things for friends and family (although I enjoy the washing up a lot less…). I’ve been baking and cooking pretty much since I could walk as I was taught to cook from a young age. You can see some of my creations here but my favourite thing to bake is bread.

I think the best way to end this introduction is just to say how much I’m looking forward to what the Quo VaDis project, and working in CASS in general, has to offer. I’m grateful to be working in the one of the best corpus research centres in the world and I can’t wait to see what the next three years brings.

Tara Coltman-Patel – Introductory Blog

This image has an empty alt attribute; its file name is image-1024x1024.png

My name is Tara Coltman-Patel and I am so excited to be a new member of CASS.

I am working as one of the Senior Research Associates on the ESRC-funded Quo VaDis project: Questioning Vaccination Discourse: A Corpus-Based Study project, which explores discussions about vaccinations in UK parliamentary debates, UK national newspapers and on the social media sites, Twitter, Reddit and Mumsnet. Using a variety of corpus tools and techniques, we will aim to gain a better understanding of the wide spectrum of pro-, anti- and undecided views surrounding vaccinations. Analysing how vaccinations are discussed across a variety of contexts, how the different views are communicated, and how people with different views interact, particularly on social media, will be an invaluable tool for addressing vaccine hesitancy. With our results we aim to inform, facilitate and help design future public health campaigns about vaccinations. As vaccinations are a salient topic, especially given the time we are currently living through, I am extremely grateful to have the opportunity to work on this research.

Before joining CASS I was working at Nottingham Trent University, where I recently finished my PhD which focussed on weight stigma and the representation of obesity in the British Press. In doing so I explored how metaphors can sensationalise and dehumanise people with obesity, I explored how science is recontextualised and misrepresented, and I explored the linguistic strategies of representation used in personal stories about weight loss. I am currently in the process of turning that research into a book titled ‘(Mis)Representing Obesity in the Press: Fear, Divisiveness, Shame and Stigma’, which will hopefully be published towards the end of 2022. Weight discrimination is a topic I am incredibly passionate about and in addition to research I have also worked as an anti-weight discrimination advocate and have consulted on global campaigns with the World Obesity Federation.

Outside of research I am a massive book worm and I love to read, I’m obsessed with RuPaul’s Drag Race and I’m also a sucker for a nice beer garden. Before Covid I loved to travel and have backpacked around Australia, Thailand, The Philippines, Mauritius and South Africa. I have some amazing and memorable moments from those trips, from bad ones like falling off a (small) cliff in Mauritius and being bitten on my hand by a spider in Australia, to incredible ones like canyoneering in The Philippines and swimming with sharks in Australia and South Africa. Sharks are my favourite animal and I have a plethora of fun facts about them ready to share at any given moment, so you definitely won’t regret inviting me to parties …

To conclude, I’m really thrilled to be a part of CASS and the Quo VaDis project, and as I have run out of interesting things to say about myself, I’ll end this blog post here.

#LancsBox: The emerging historical linguist’s MO? A brief case study of Aramaic.

By: Charbel El-Khaissi

I took Lancaster University’s free Corpus Linguistics course (Corpus MOOC) to fill time. Three months later, a doctoral research proposal enabled by #LancsBox, a software tool introduced in the course, was accepted at the Australian National University.

For as long as this topic has been studied, ancient Semitic languages have relied on classical philological approaches. Naturally, a tension exists between this tradition and contemporary approaches in computational linguistics. It would be unfair to characterise this divide as a mere consequence of ‘old-school’ scholars resisting technological changes in research because philology is an inherent part of the study. The study of any ancient language requires far more human involvement than a machine can achieve: a careful hand to conserve and restore manuscripts, a keen eye for epigraphic analysis and a well-rounded, learned mind to interpret literature in medias res, politically, theologically and societally. However, as far as the researcher is open to computer-assistive technology, #LancsBox fills a much-needed gap in historical linguistics, especially in the field of Semitic historical syntax.

As a case in point, consider the Aramaic language: the longest, continuously spoken Semitic language with an attested lifespan of approximately 3,000 years. This human language offers linguists intriguing insights on how human languages change over a substantial time period, including changes in its underlying structure (i.e., grammar and syntax). If these changes are substantiated then their insights may lend important cues concerning the evolution of human cognition itself. Yet, the historical syntax of Aramaic remains largely underrepresented and understudied. Few commendable scholars have undertaken the task of analysing developments in areas of Aramaic grammar (e.g., Huehnergard, 2005; Rubin, 2005; Grassi, 2009; Pat-El, 2012; Coghill, 2012). Among other reasons, the lack of rigorous study in this discipline is due to the labour-intensive task of qualitatively analysing large corpora. This task is made more difficult by a manual transcription and grammatical tagging process, in addition to administration duties such as record management and categorisation. Recent advancements in Aramaic computational linguistics – including, but not limited to Handwriting-text Recognition (HTR) technology and digital archives – have significantly reduced time of text transcription and tagging. However, the diachronic analysis of large corpora remains tedious without a free, user-friendly and accessible corpus software like #LancsBox.

My doctoral research is among the first studies in Semitic historical linguistics to experiment with Lancaster University’s #LancsBox corpus software and analyse Aramaic syntax over time. Thus far, it has proven to be an exceptional tool for data management and diachronic analysis (see Figure 1 and Figure 2):

• Corpus management: the ease of creating, storing and analysing (sub-)corpora based on variables of interest (e.g., by dialect, century, author) reduces administrative overhead and gives me more time test different hypotheses according to multiple variables.

• POS-tagging: in addition to offering POS tagging in a number of languages, #LancsBox caters to self-tagged corpora. This means I can import datasets that have been annotated according to my own tagging scheme, which gives me flexibility when testing the robustness of tag sets according to various theoretical frameworks.

As with any computer software, few caveats are worthy of mention to historical Semitic linguists interested in using the software for their research.

• Coding: basic knowledge of Regular Expression coding is needed to execute meaningful, in-context searches.

• Font: in its current version (5.0), Aramaic is partially-supported, with some fonts appearing disconnected. This makes in-tool legibility difficult, but not impossible.

• Text-direction: in its current version (5.0), Aramaic texts appear reversed (e.g., “cat” appears “tac”). Current workarounds include (1) using free, online tools to reverse the text prior to import, or (2) conducting analysis outside the tool.

Will #LancsBox become the MO for future historical linguists? Only time will tell. It seems to me the only accessible software currently available for linguists who wish to build and design their own corpus, especially in underrepresented and under resourced languages. In fact, I can think of a number of innovative applications outside the research domain as well: for example, Australian linguists might be able to use #LancsBox to investigate which linguistic features have been declining in student writing over the last decade. Perhaps then #LancsBox’s core functionalities could help academics in other fields and a wider group of users.

Watch a 60-second video of Charbel El-Khaissi’s research here.

Acknowledgements: Thank you Professor Tony McEnery, Dr Pierre Weill-Tessier and Dr Vaclav Brezina whose innovations have enabled my research. I express gratitude to my supervisory panel for their ongoing guidance.

British Muslims Caught Amidst FOGs – A Discourse Analysis of Religious Advice and Authority

By Usman Maravia

In this blog entry, I will provide an overview of my latest article which explores the writing style of Islamic advice texts on COVID-19. The issues that were addressed in these advice texts were related to the topic of mosque closures, funerary rites, fasting during Ramadan, and suspending Friday and daily prayers to help curb the spread of COVID-19. These texts were being circulated in the UK in March and April of 2020, a crucial period wherein information was passed on to address issues that, in the scope of the study, British Muslims would face in Ramadan, which began on 25th April 2020.

The context

My interest in this topic was sparked by an unfortunate COVID-19 related death of an elderly Muslim from Walsall. A family member of the deceased stated in the Press that “It is imperative that we learn from this tragic loss and comply with Government guidelines to save lives”. What further caught my interest was that if the aim of the Islamic advice documents was to help Muslims stay safe during the pandemic, a unified and standardised message with collaboration between Muslim faith leaders and health professionals would have been helpful. Instead, a range of documents were found to be circulated as well as these documents differed in their titles – leading to ambiguity of exactly what preventative British Muslims were to take and where exactly lay the authority.

Moreover , the titles of these documents differed. Some were titled fatwa, which is a non-binding legal opinion of an Islamic legal expert, but still a document that could potentially carry much influence on Muslim communities in the UK. Some documents were written by healthcare professionals and were titled guidance documents – I wondered, do these documents carry the same weight as fatwas? And yet other documents were neither titled a fatwa nor guidance but in a hybrid style of the two categories, again I wondered, why were these words used in the titles?

The FOG corpus

As such, I sought to identify a) the underlying reasons behind the titling of the documents; and (b) the construction of discourses in the documents. In collaboration with my colleagues Zhazira Bekzhanova (Astana IT University, Kazakhstan), Mansur Ali (Centre for the Study of Islam in the UK, Cardiff University), and Rakan Alibri (University of Tabuk), we collected a total of 76 texts that were available online on websites of British mosques, Facebook pages and other online venues. We found that of these 76 documents, 14 documents were clearly titled fatwa. We also found that six documents were titled guidance documents, and an eye-catching 56 documents, which we refer to as other documents, included a range of words in their titles such as analysis, clarification, confirmation, guidelines, method, pathway, permissibility, plan of action, points, recommendation, response, ruling, and statement. This classification led to our jocular acronym FOG i.e., fatwas, other documents, and guidance documents. This compilation then led to the creation of the specialised FOG corpus consisting of around 110,000 words.

We examined these written electronic texts in the social context of Muslims and COVID-19 in the UK. We explored the way language was used in real-life in fatwas, guidance documents, and other documents. We then focused on the way the authors of these documents differ in their writing styles to create a certain impression on the audience by increasing, in Bourdieu’s terms, symbolic capital. Moreover, we focus on representation of social actors (van Leeuwen, 1995) in deciphering power relations across the FOG documents. Moreover, references to social actors are widely analysed and interpreted across the FOG documents. Other than text producers of these documents, the audience’s references are also analysed, explained, and interpreted through the prism of authorities.

Corpus methods

We applied corpus-assisted critical discourse analysis, which helped us to uncover important patterns in relation to FOGs. Using AntConc software, we analysed the frequency of words, word lists, lexical bundles, collocations, concordance plots, and concordances to detect linguistic patterns in the FOG corpus. Corpus methods also assisted us with the tools to detect power hierarchies and inequalities within the texts. Moreover, our corpus-assisted study strengthens Brookes and McEnery’s study, that texts do acquire symbolic capital through an accumulation of patterns of textual cohesion and rhetorical strategies. We found that the documents appear to follow an underlying hierarchy among British Muslim scholars.

Findings

To elaborate, a particular writing style can be found across the FOG documents. We found fatwas and guidance documents to be textually diametric, whereas other documents were found to feature greater intertextuality as well as maintaining respect to the authority of muftis and their fatwas, but with reservations. The fatwas were found to be written by senior muftis and contained important references to the Qur’an and Muhammad, the Prophet of Islam. Fatwas also included legal terminology in Arabic related to Shariah law. Moreover, fatwas contained phrases such as ‘according to’ and ‘Allah knows best’.

Such a writing style is in accordance with the traditional writing style of fatwas and thereby holds higher symbolic capital. On the other hand, guidance documents were produced by healthcare professionals and did not contain such theologically related phrases but rather relied on scientific and medical language. Interestingly, we found the other category of documents to be written in a hybrid-style of fatwas and guidance documents. Such a writing style appears to increase the symbolic capital of these documents as well as it empowers the writers to challenge existing fatwas – whilst maintaining respect for senior muftis.

While the FOG documents reveal that multiple voices are welcome in addressing a national emergency, we recommend that a standardisation of documents, issued in collaboration with the NHS and senior muftis, could perhaps give a clearer action plan for British Muslims in future. As such, this study is intended to give an impetus to social scientists to explore the discourse of British Muslims and COVID-19 through a linguistic lens.

Our article is available to read in MDPI’s open access journal Religion. Additionally, further research is being carried out on the topic of COVID-19 by the British Islamic Medical Association’s (BIMA) as part of ‘Operation Vaccination’.

For my article on addressing vaccine resistance from an Islamic perspective, please read Vaccines: religio-cultural arguments from an Islamic perspective published by JBIMA.

‘Face masks’ and ‘face coverings’ in the UK press during the Covid-19 pandemic: Scottish vs. national newspapers

Carmen Dayrell, Isobelle Clarke and Elena Semino (Lancaster University)

1 Introduction

Since the beginning of the Covid-19 pandemic, the use of face masks or face coverings as a means of reducing the transmission of the virus has been a major area of debate in many countries around the world. In the UK specifically, the first nine months of 2020 saw a rapid change from a view of face masks as a medical piece of PPE that would not be appropriate or acceptable for the general population, to the establishment of non-surgical face coverings as a recommended public health measure in indoor public spaces, such as buses and supermarkets. As with other aspects of the response to the pandemic, during that time there were differences in the approach to face masks/coverings between the Scottish devolved administration and the Westminster government.

Table 1 provides a timeline summary of policy decisions concerning face masks/coverings on public transport, shops and schools in Scotland and England. For the most part, in Scotland face coverings were recommended or made mandatory earlier than in England. They are also mandatory in corridors and communal areas in Scottish schools, whereas in England this is at the school’s discretion.

 Public transportShopsSchools
April(28th) Scotland (recommended)(28th) Scotland (recommended) 
May(11th) England (recommended)(11th) England (recommended) 
June(15th)England (mandatory) (22nd) Scotland (mandatory)  
July (10th) Scotland (mandatory) (24th) England (mandatory) 
August  (31st) Scotland (mandatory in corridors and communal areas)
September  (1st) England (school/college discretion in indoors communal areas)
Table 1 – Timeline of policy decisions about the wearing of face coverings by the general public in Scotland vs. England.

Scotland has also had a lower incidence of Covid-19 than England. According to official UK government data, as of 30th December 23 people per 1,000 had had at least one positive Covid-19 test in Scotland, in contrast with 39 people per 1,000 in England.

This blog post is concerned with references to face masks and face coverings in Scottish vs. national UK newspapers between December 2019 and August 2020, that is from the start of reports about a new type of pneumonia in Wuhan, China, up to the beginning of the 2020-21 school year in the UK.

2 Research questions

Overarching research question

How does press reporting on face masks and face coverings in Scotland compare with national UK reporting between December 2019 and August 2020?

Specific research questions

  1. How did the frequency of use of ‘face covering(s)’ vs. ‘face mask(s)’ change over time in Scottish vs. national press reporting?
  2. Were there any statistically significant differences in the relative frequencies of the use of ‘face mask(s)’ and ‘face covering(s)’, and of terms relating to places where face masks/coverings may be used, in Scottish vs. national press reporting?
  3. What are the differences and similarities in the collocations (co-occurrence of words) of ‘face mask(s)’ vs. ‘face covering(s)’ in Scottish and national press reporting?

3 Findings in brief

Finding 1 – Over time, ‘face covering(s)’ became more frequent than ‘face mask(s)’ in the Scottish press, but not in the national press.

Finding 2 – ‘Face covering(s)’ are mentioned much more often, relatively speaking, in the Scottish press than in the national press, alongside other terms for public indoor environments where they may be worn.

Finding 3 – Face ‘mask(s)’ and ‘covering(s)’ have partly different collocates, reflecting differences in status and associated narratives.

4 Data

The news aggregator service LexisNexis was used to collect articles that contained either the phrase ‘face mask(s)’ or ‘face covering(s)’ and that were published in a selection of national and Scottish newspapers in the period between 01.12.2019 and 31.08.2020.

Table 2 provides the numbers of texts and words included in each of the resulting two corpora: the Scottish Corpus and the National Corpus. For the National Corpus, we also provide figures for articles extracted from ‘broadsheet’ vs. ‘tabloid’ newspapers, constituting the Broadsheet and Tabloid subcorpora. (NB: For the national newspapers specifically, we selected the national editions only, thus excluding the Irish, Scottish and Northern Ireland editions.). Figures 1 to 4 below show the number of articles per newspaper title within each corpus.

CorpusNumber of textsNumber of Words
National corpus11,53619,401,316
 The Broadsheet subcorpus6,63116,657,194
 The Tabloid subcorpus2,4191,264,952
Scottish corpus1,084588,894
Table 2: Number of texts and total number of words comprising each corpus
Figure 1: Number of texts from each national title

Figure 2: Number of texts from each broadsheet title
Figure 3: Number of texts from each tabloid title

Figure 4: Number of texts from each Scottish newspaper title

The Broadsheet subcorpus is by far the largest of all datasets, both in terms of the number of texts and the number of words (Table 2). Within that subcorpus, The Guardian and The Observer account for the highest number of articles, corresponding to 36% of texts and 83% of the words in that subcorpus (13,744,333 out of 16,657,194). The number of texts is more evenly distributed in the Tabloid subcorpus (Figure 3). The Daily Mail accounts for the largest number of texts (20%) but it is closely followed by The Express, The Sun and Evening Standard (17% and 15% each respectively). Within the Scottish corpus, most texts come from The Daily Record and The National (32% each).

5 Method

To answer question 1.a, we plotted the frequencies of the search terms used to collect the texts that comprise the corpora, ‘face mask(s)’ and ‘face covering(s)’. These figures give us an indication of how the level of attention fluctuated in the National and Scottish press throughout time.

To answer question 1.b, we carried out a ‘keyword’ analysis of the Scottish Corpus as compared with the National Corpus as a whole. Keywords are words that are much more frequent in a corpus of interest (known as the ‘study’ corpus) than they are in another corpus (known as the ‘reference corpus’), where the difference is statistically significant. They can be interpreted as reflecting the most distinctive concepts and themes in a particular corpus. The analysis was carried out using WordSmith Tools, version 7.

For the calculation of keywords, we established that the candidate keyword should occur in at least 5% of texts in the study corpus. This thus determined the minimum frequency of each term, which varied from one corpus to another. The minimum frequency was 577 instances in the National Corpus and 54 in the Scottish Corpus. In terms of statistical tests, we combined the log-likelihood test (a statistical measure of confidence) with log-ratio as the effect size measure, using the following threshold: a critical value higher than 15.13 (p < 0.001) for the log-likelihood test and 1.5 as the minimum log-ratio score, discarding negative scores. Keywords were then grouped by theme through close reading of the concordance lines, that is, individual occurrences of each word with the preceding and following stretches of text.

To answer question 1.c, we carried out a ‘collocation’ analysis of the terms ‘face mask(s)’ and ‘face covering(s)’. Collocation analyses explore co-occurrence relationships between words, and therefore make it possible to study the narratives or discourses that a word is part of. A word collocates with another if it is more likely to be found in close proximity to the other word than elsewhere. Collocations were generated by means of the software package LancsBox, on the basis of the criteria below:

  • Span of 5:5 – a window of five words to the left and five words to the right of the search word.
  • Mutual Information (MI) score ≥ 6. MI is a statistical procedure widely employed in corpus studies to indicate how strong the association between two words is. It is calculated by considering their frequency of co-occurrence in relation to their frequencies when occurring independently in each corpus.
  • Minimum frequency of collocation: 10 occurrences per 1,000 instances of term in question. For example, ‘face mask(s)’ occurs 1,672 times in the Welsh corpus; the minimum frequency of collocation was therefore 17 instances.

Similar to the analysis of keywords, collocations were analysed by close reading of their concordance lines.

6 Findings

Finding 1 – Over time, ‘face covering(s)’ became more frequent than ‘face mask(s)’ in the Scottish press, but not in national press.

Figures 5-6 show the frequency distribution of the terms ‘face mask(s)’ and ‘face covering(s)’ in the two corpora across time, considering the relative frequencies of terms (per 100,000 words). Note that the scale varies from one chart to another; that is due to differences in the amount of data from each corpus.


Figure 5: Relative frequencies of ‘face covering(s)’ and ‘face mask(s)’ in the National Corpus

Figure 6: Relative frequencies of ‘face covering(s)’ and ‘face mask(s)’ in the Scottish Corpus

As can be seen, both corpora show a clear preference for the term ‘face mask(s)’ in the early months, from December 2019 to March 2020, with hardly any mention of the term ‘face covering(s)’. Scottish newspapers seem to have embraced the term first, with mentions of ‘face covering(s)’ increasing swiftly in April 2020, corresponding to nearly half of the number of mentions of ‘face mask(s)’ in that month (83 as compared with 181 instances). National newspapers showed a modest increase in the mentions of ‘face covering(s)’ in April; the term ‘face mask(s)’ was nearly six times more frequent than ‘face covering(s)’ in the national newspapers (2,241 in relation to 386 instances). Mentions of ‘face covering(s)’ continued to rise across both corpora in the following months. In May, they represented about half of the number of mentions of ‘face mask(s)’ in the Scottish corpus and about a third in the National Corpus. By June, mentions of ‘face covering(s)’ surpassed those of ‘face mask(s)’ in Scottish newspapers. In national newspapers, ‘face mask(s)’ remained more frequent than ‘face covering(s)’ across the entire period.

Finding 2 – ‘Face covering(s)’ are mentioned much more often, relatively speaking, in the Scottish press than in the national press, alongside other terms for public indoor environments where they may be worn.

The words ‘covering’ and ‘coverings’, which tend to occur in the phrase ‘face covering(s)’, were found to be ‘key’ or ‘overused’ in the Scottish as compared with the National Corpus. In other words, ‘covering’ and ‘coverings’ are used much more often, in terms of relative frequencies, in the Scottish Corpus than in the National Corpus, based on our thresholds for effect size (log-ratio) and statistical significance (log-likelihood). However, based on the same thresholds, the word ‘mask(s)’ is not overused in the National corpus as compared with the Scottish Corpus. This means that ‘covering(s)’ in the Scottish Corpus is not in complementary distribution to ‘mask(s)’ in the National Corpus.

Overall, the keyword calculation retrieved 41 overused items in the Scottish Corpus, using the National corpus as reference in both. Table 3 includes the complete lists of keywords in the Scottish Corpus, grouped thematically and then ordered by their frequency of occurrence in the corpus.

Table 3 shows that the keywords in the Scottish Corpus include three other terms that are related to face coverings (‘mandatory’, ‘worn’ and ‘mouth’) as well as groups of words that relate to the different environments where face coverings may or may not be recommended or mandatory: Space (e.g. ‘indoor’, ‘outdoor’, ‘household’), Retail/hospitality (e.g. ‘shop’, ‘hospitality’) and Education (e.g. ‘pupils’, ‘teachers’).

Table 3: ‘Keywords’ in the Scottish Corpus, grouped by theme

The overuse of the word ‘kids’ reflects discussions about the age at which face masks/coverings should be made compulsory, as expressed by a reader’s comment published by The Glasgow Evening Times (Extract 1):

(1) “I AM so confused myself. Our kids are going with no distancing and in shops and malls and cinemas and public transport and airports. There is this hype of distancing. Which one is right? Are the poor kids so strong that they will not catch it at all and will not bring anything back home to their elderly grans etc? So illogical!” (The Glasgow Evening Times, 21.08.2020)

  • The keywords also include a group that is to do with Other Measures to reduce contagion, particularly in public spaces such as shops, restaurants and pubs (e.g. ‘screens’, ‘two-metre’). This is because face coverings are often presented as necessary when those other measures are not practicable:

(2) The government guidance says: “If you can, wear a face covering in an enclosed space where social distancing isn’t possible and where you will come into contact with people you do not normally meet. (The National, 25.06.2020).

Finding 3 – Face ‘mask(s)’ and ‘covering(s)’ have partly different collocates, reflecting differences in status and associated narratives.

We now examine the collocates of ‘face mask(s)’ and ‘face covering(s)’ in the two corpora. These are listed in Tables 4 and 5, in decreasing order of frequency of co-occurrence with each term.


Table 4: Collocations of ‘face mask(s)’ and ‘face covering(s)’ in the Scottish Corpus

Table 5: Collocations of ‘face mask(s)’ and ‘face covering(s)’ in the National Corpus

Five words appeared as collocates of both ‘face mask(s)’ and ‘face covering(s)’ in both corpora. These are: three different forms of the verb ‘wear’ (‘wear’, ‘wearing’, ‘worn’), ‘compulsory’ and ‘mandatory’. These suggest that ‘mask(s)’ and ‘covering(s)’ are both used in the context of debates and decisions about the need or obligation to wear them in certain settings.

Figure 7: Instances of ‘face mask(s)’ in the Scottish Corpus

However, the collocates that only apply to ‘face mask(s)’ show that they tend to be talked about as a type of PPE in clinical or care settings (e.g. ‘protective’, ‘surgical’, ‘gloves’, ‘aprons’).

(3) Carers, many of whom are paid low wages by private sector firms, have complained they have not been provided with essential items such as hand sanitiser, gloves, aprons, and face masks. (The Independent, 24.03.2020)

In contrast, the collocates that only apply to ‘covering(s)’ show that they tend to be talked about as a non-medical item of clothing that is:

  • made of cloth and a potential fashion accessory or political statement (‘cloth’, ‘branded’);

(4) Currently no other party is selling branded face coverings, although many independent online shops stock masks with Union flag or political designs. (The National, 25.07.2020)

(5) Face coverings include scarves, a piece of cloth or a mask and certain travellers – such as people with disabilities or breathing difficulties – will be exempt. (The Daily Express, 06.06.2020)

  • recommended to be worn (e.g. ‘recommended’, ‘advised’);

(6) Earlier this week, First Minister Nicola Sturgeon recommended the limited use of face coverings – not necessarily masks – when social distancing is hard to maintain. (Glasgow Evening Times, 04.05.2020)

(7) Other precautions advised include wearing face coverings in public as much as possible, keeping two metres apart, avoiding physical contact with those outside one’s household and to be tested and isolate if told to do so. (The Telegraph 18.07.2020)

  • in specific indoor public settings (‘crowded’, ‘enclosed’; ‘shops’, ‘transport’);

(8) “However, we are recommending you do wear a cloth face covering if you are in an enclosed space with others where social distancing is difficult – for example, on public transport, or in a shop.” (The National, 28.04.2020)

(9) It is compulsory to wear face coverings on public transport, in shops and when collecting takeaway food. (The Sun, 14.08.2020)

  • by large sections of the population (‘secondary’, ‘pupils’, ‘passengers’).

(10) A SECONDARY school is asking pupils to wear face coverings as part of efforts to combat the spread of coronavirus. (The Herald, 23.08.2020)

(11) Passengers have been told to wear face coverings on public transport to prevent a further outbreak of coronavirus as Britain slowly emerges from the lockdown. (The Times, 12.05.2020)

What does not, however, emerge from the collocates of ‘face covering(s)’ in either corpus is a consistent message about their role in protecting others from droplets produced by the wearer, thus reducing transmission overall. This may partly explain ongoing opposition to or scepticism about the usefulness of face coverings during the pandemic.

7 Conclusions

Overall, in the period December 2019 – August 2020, reports on face mask(s)/covering(s) in the Scottish press contrasted with the national press in terms of: a preference for ‘face covering(s)’ over ‘face mask(s)’ from April 2020 onwards; and a greater concern for their use to mitigate the transmission of the virus in schools, shops and other public indoor environments. This can only be partly explained by the fact that the Westminster government made decisions about the recommended/mandatory use of face coverings in public indoor spaces slightly later than the Scottish devolved administration. The contrasting collocates of ‘face covering(s)’ vs. ‘face mask(s)’ confirm that they are associated with different settings and narratives: PPE in clinical/care settings vs. item of clothing/accessory to be worn in public indoor environments by the general population as a public health measure. In the period under consideration, the latter narrative was therefore increasingly prevalent in Scottish but not in national newspapers.