#LancsBox: The emerging historical linguist’s MO? A brief case study of Aramaic.

By: Charbel El-Khaissi

I took Lancaster University’s free Corpus Linguistics course (Corpus MOOC) to fill time. Three months later, a doctoral research proposal enabled by #LancsBox, a software tool introduced in the course, was accepted at the Australian National University.

For as long as this topic has been studied, ancient Semitic languages have relied on classical philological approaches. Naturally, a tension exists between this tradition and contemporary approaches in computational linguistics. It would be unfair to characterise this divide as a mere consequence of ‘old-school’ scholars resisting technological changes in research because philology is an inherent part of the study. The study of any ancient language requires far more human involvement than a machine can achieve: a careful hand to conserve and restore manuscripts, a keen eye for epigraphic analysis and a well-rounded, learned mind to interpret literature in medias res, politically, theologically and societally. However, as far as the researcher is open to computer-assistive technology, #LancsBox fills a much-needed gap in historical linguistics, especially in the field of Semitic historical syntax.

As a case in point, consider the Aramaic language: the longest, continuously spoken Semitic language with an attested lifespan of approximately 3,000 years. This human language offers linguists intriguing insights on how human languages change over a substantial time period, including changes in its underlying structure (i.e., grammar and syntax). If these changes are substantiated then their insights may lend important cues concerning the evolution of human cognition itself. Yet, the historical syntax of Aramaic remains largely underrepresented and understudied. Few commendable scholars have undertaken the task of analysing developments in areas of Aramaic grammar (e.g., Huehnergard, 2005; Rubin, 2005; Grassi, 2009; Pat-El, 2012; Coghill, 2012). Among other reasons, the lack of rigorous study in this discipline is due to the labour-intensive task of qualitatively analysing large corpora. This task is made more difficult by a manual transcription and grammatical tagging process, in addition to administration duties such as record management and categorisation. Recent advancements in Aramaic computational linguistics – including, but not limited to Handwriting-text Recognition (HTR) technology and digital archives – have significantly reduced time of text transcription and tagging. However, the diachronic analysis of large corpora remains tedious without a free, user-friendly and accessible corpus software like #LancsBox.

My doctoral research is among the first studies in Semitic historical linguistics to experiment with Lancaster University’s #LancsBox corpus software and analyse Aramaic syntax over time. Thus far, it has proven to be an exceptional tool for data management and diachronic analysis (see Figure 1 and Figure 2):

• Corpus management: the ease of creating, storing and analysing (sub-)corpora based on variables of interest (e.g., by dialect, century, author) reduces administrative overhead and gives me more time test different hypotheses according to multiple variables.

• POS-tagging: in addition to offering POS tagging in a number of languages, #LancsBox caters to self-tagged corpora. This means I can import datasets that have been annotated according to my own tagging scheme, which gives me flexibility when testing the robustness of tag sets according to various theoretical frameworks.

As with any computer software, few caveats are worthy of mention to historical Semitic linguists interested in using the software for their research.

• Coding: basic knowledge of Regular Expression coding is needed to execute meaningful, in-context searches.

• Font: in its current version (5.0), Aramaic is partially-supported, with some fonts appearing disconnected. This makes in-tool legibility difficult, but not impossible.

• Text-direction: in its current version (5.0), Aramaic texts appear reversed (e.g., “cat” appears “tac”). Current workarounds include (1) using free, online tools to reverse the text prior to import, or (2) conducting analysis outside the tool.

Will #LancsBox become the MO for future historical linguists? Only time will tell. It seems to me the only accessible software currently available for linguists who wish to build and design their own corpus, especially in underrepresented and under resourced languages. In fact, I can think of a number of innovative applications outside the research domain as well: for example, Australian linguists might be able to use #LancsBox to investigate which linguistic features have been declining in student writing over the last decade. Perhaps then #LancsBox’s core functionalities could help academics in other fields and a wider group of users.

Watch a 60-second video of Charbel El-Khaissi’s research here.

Acknowledgements: Thank you Professor Tony McEnery, Dr Pierre Weill-Tessier and Dr Vaclav Brezina whose innovations have enabled my research. I express gratitude to my supervisory panel for their ongoing guidance.

British Muslims Caught Amidst FOGs – A Discourse Analysis of Religious Advice and Authority

By Usman Maravia

In this blog entry, I will provide an overview of my latest article which explores the writing style of Islamic advice texts on COVID-19. The issues that were addressed in these advice texts were related to the topic of mosque closures, funerary rites, fasting during Ramadan, and suspending Friday and daily prayers to help curb the spread of COVID-19. These texts were being circulated in the UK in March and April of 2020, a crucial period wherein information was passed on to address issues that, in the scope of the study, British Muslims would face in Ramadan, which began on 25th April 2020.

The context

My interest in this topic was sparked by an unfortunate COVID-19 related death of an elderly Muslim from Walsall. A family member of the deceased stated in the Press that “It is imperative that we learn from this tragic loss and comply with Government guidelines to save lives”. What further caught my interest was that if the aim of the Islamic advice documents was to help Muslims stay safe during the pandemic, a unified and standardised message with collaboration between Muslim faith leaders and health professionals would have been helpful. Instead, a range of documents were found to be circulated as well as these documents differed in their titles – leading to ambiguity of exactly what preventative British Muslims were to take and where exactly lay the authority.

Moreover , the titles of these documents differed. Some were titled fatwa, which is a non-binding legal opinion of an Islamic legal expert, but still a document that could potentially carry much influence on Muslim communities in the UK. Some documents were written by healthcare professionals and were titled guidance documents – I wondered, do these documents carry the same weight as fatwas? And yet other documents were neither titled a fatwa nor guidance but in a hybrid style of the two categories, again I wondered, why were these words used in the titles?

The FOG corpus

As such, I sought to identify a) the underlying reasons behind the titling of the documents; and (b) the construction of discourses in the documents. In collaboration with my colleagues Zhazira Bekzhanova (Astana IT University, Kazakhstan), Mansur Ali (Centre for the Study of Islam in the UK, Cardiff University), and Rakan Alibri (University of Tabuk), we collected a total of 76 texts that were available online on websites of British mosques, Facebook pages and other online venues. We found that of these 76 documents, 14 documents were clearly titled fatwa. We also found that six documents were titled guidance documents, and an eye-catching 56 documents, which we refer to as other documents, included a range of words in their titles such as analysis, clarification, confirmation, guidelines, method, pathway, permissibility, plan of action, points, recommendation, response, ruling, and statement. This classification led to our jocular acronym FOG i.e., fatwas, other documents, and guidance documents. This compilation then led to the creation of the specialised FOG corpus consisting of around 110,000 words.

We examined these written electronic texts in the social context of Muslims and COVID-19 in the UK. We explored the way language was used in real-life in fatwas, guidance documents, and other documents. We then focused on the way the authors of these documents differ in their writing styles to create a certain impression on the audience by increasing, in Bourdieu’s terms, symbolic capital. Moreover, we focus on representation of social actors (van Leeuwen, 1995) in deciphering power relations across the FOG documents. Moreover, references to social actors are widely analysed and interpreted across the FOG documents. Other than text producers of these documents, the audience’s references are also analysed, explained, and interpreted through the prism of authorities.

Corpus methods

We applied corpus-assisted critical discourse analysis, which helped us to uncover important patterns in relation to FOGs. Using AntConc software, we analysed the frequency of words, word lists, lexical bundles, collocations, concordance plots, and concordances to detect linguistic patterns in the FOG corpus. Corpus methods also assisted us with the tools to detect power hierarchies and inequalities within the texts. Moreover, our corpus-assisted study strengthens Brookes and McEnery’s study, that texts do acquire symbolic capital through an accumulation of patterns of textual cohesion and rhetorical strategies. We found that the documents appear to follow an underlying hierarchy among British Muslim scholars.

Findings

To elaborate, a particular writing style can be found across the FOG documents. We found fatwas and guidance documents to be textually diametric, whereas other documents were found to feature greater intertextuality as well as maintaining respect to the authority of muftis and their fatwas, but with reservations. The fatwas were found to be written by senior muftis and contained important references to the Qur’an and Muhammad, the Prophet of Islam. Fatwas also included legal terminology in Arabic related to Shariah law. Moreover, fatwas contained phrases such as ‘according to’ and ‘Allah knows best’.

Such a writing style is in accordance with the traditional writing style of fatwas and thereby holds higher symbolic capital. On the other hand, guidance documents were produced by healthcare professionals and did not contain such theologically related phrases but rather relied on scientific and medical language. Interestingly, we found the other category of documents to be written in a hybrid-style of fatwas and guidance documents. Such a writing style appears to increase the symbolic capital of these documents as well as it empowers the writers to challenge existing fatwas – whilst maintaining respect for senior muftis.

While the FOG documents reveal that multiple voices are welcome in addressing a national emergency, we recommend that a standardisation of documents, issued in collaboration with the NHS and senior muftis, could perhaps give a clearer action plan for British Muslims in future. As such, this study is intended to give an impetus to social scientists to explore the discourse of British Muslims and COVID-19 through a linguistic lens.

Our article is available to read in MDPI’s open access journal Religion. Additionally, further research is being carried out on the topic of COVID-19 by the British Islamic Medical Association’s (BIMA) as part of ‘Operation Vaccination’.

For my article on addressing vaccine resistance from an Islamic perspective, please read Vaccines: religio-cultural arguments from an Islamic perspective published by JBIMA.

‘Face masks’ and ‘face coverings’ in the UK press during the Covid-19 pandemic: Scottish vs. national newspapers

Carmen Dayrell, Isobelle Clarke and Elena Semino (Lancaster University)

1 Introduction

Since the beginning of the Covid-19 pandemic, the use of face masks or face coverings as a means of reducing the transmission of the virus has been a major area of debate in many countries around the world. In the UK specifically, the first nine months of 2020 saw a rapid change from a view of face masks as a medical piece of PPE that would not be appropriate or acceptable for the general population, to the establishment of non-surgical face coverings as a recommended public health measure in indoor public spaces, such as buses and supermarkets. As with other aspects of the response to the pandemic, during that time there were differences in the approach to face masks/coverings between the Scottish devolved administration and the Westminster government.

Table 1 provides a timeline summary of policy decisions concerning face masks/coverings on public transport, shops and schools in Scotland and England. For the most part, in Scotland face coverings were recommended or made mandatory earlier than in England. They are also mandatory in corridors and communal areas in Scottish schools, whereas in England this is at the school’s discretion.

 Public transportShopsSchools
April(28th) Scotland (recommended)(28th) Scotland (recommended) 
May(11th) England (recommended)(11th) England (recommended) 
June(15th)England (mandatory) (22nd) Scotland (mandatory)  
July (10th) Scotland (mandatory) (24th) England (mandatory) 
August  (31st) Scotland (mandatory in corridors and communal areas)
September  (1st) England (school/college discretion in indoors communal areas)
Table 1 – Timeline of policy decisions about the wearing of face coverings by the general public in Scotland vs. England.

Scotland has also had a lower incidence of Covid-19 than England. According to official UK government data, as of 30th December 23 people per 1,000 had had at least one positive Covid-19 test in Scotland, in contrast with 39 people per 1,000 in England.

This blog post is concerned with references to face masks and face coverings in Scottish vs. national UK newspapers between December 2019 and August 2020, that is from the start of reports about a new type of pneumonia in Wuhan, China, up to the beginning of the 2020-21 school year in the UK.

2 Research questions

Overarching research question

How does press reporting on face masks and face coverings in Scotland compare with national UK reporting between December 2019 and August 2020?

Specific research questions

  1. How did the frequency of use of ‘face covering(s)’ vs. ‘face mask(s)’ change over time in Scottish vs. national press reporting?
  2. Were there any statistically significant differences in the relative frequencies of the use of ‘face mask(s)’ and ‘face covering(s)’, and of terms relating to places where face masks/coverings may be used, in Scottish vs. national press reporting?
  3. What are the differences and similarities in the collocations (co-occurrence of words) of ‘face mask(s)’ vs. ‘face covering(s)’ in Scottish and national press reporting?

3 Findings in brief

Finding 1 – Over time, ‘face covering(s)’ became more frequent than ‘face mask(s)’ in the Scottish press, but not in the national press.

Finding 2 – ‘Face covering(s)’ are mentioned much more often, relatively speaking, in the Scottish press than in the national press, alongside other terms for public indoor environments where they may be worn.

Finding 3 – Face ‘mask(s)’ and ‘covering(s)’ have partly different collocates, reflecting differences in status and associated narratives.

4 Data

The news aggregator service LexisNexis was used to collect articles that contained either the phrase ‘face mask(s)’ or ‘face covering(s)’ and that were published in a selection of national and Scottish newspapers in the period between 01.12.2019 and 31.08.2020.

Table 2 provides the numbers of texts and words included in each of the resulting two corpora: the Scottish Corpus and the National Corpus. For the National Corpus, we also provide figures for articles extracted from ‘broadsheet’ vs. ‘tabloid’ newspapers, constituting the Broadsheet and Tabloid subcorpora. (NB: For the national newspapers specifically, we selected the national editions only, thus excluding the Irish, Scottish and Northern Ireland editions.). Figures 1 to 4 below show the number of articles per newspaper title within each corpus.

CorpusNumber of textsNumber of Words
National corpus11,53619,401,316
 The Broadsheet subcorpus6,63116,657,194
 The Tabloid subcorpus2,4191,264,952
Scottish corpus1,084588,894
Table 2: Number of texts and total number of words comprising each corpus
Figure 1: Number of texts from each national title

Figure 2: Number of texts from each broadsheet title
Figure 3: Number of texts from each tabloid title

Figure 4: Number of texts from each Scottish newspaper title

The Broadsheet subcorpus is by far the largest of all datasets, both in terms of the number of texts and the number of words (Table 2). Within that subcorpus, The Guardian and The Observer account for the highest number of articles, corresponding to 36% of texts and 83% of the words in that subcorpus (13,744,333 out of 16,657,194). The number of texts is more evenly distributed in the Tabloid subcorpus (Figure 3). The Daily Mail accounts for the largest number of texts (20%) but it is closely followed by The Express, The Sun and Evening Standard (17% and 15% each respectively). Within the Scottish corpus, most texts come from The Daily Record and The National (32% each).

5 Method

To answer question 1.a, we plotted the frequencies of the search terms used to collect the texts that comprise the corpora, ‘face mask(s)’ and ‘face covering(s)’. These figures give us an indication of how the level of attention fluctuated in the National and Scottish press throughout time.

To answer question 1.b, we carried out a ‘keyword’ analysis of the Scottish Corpus as compared with the National Corpus as a whole. Keywords are words that are much more frequent in a corpus of interest (known as the ‘study’ corpus) than they are in another corpus (known as the ‘reference corpus’), where the difference is statistically significant. They can be interpreted as reflecting the most distinctive concepts and themes in a particular corpus. The analysis was carried out using WordSmith Tools, version 7.

For the calculation of keywords, we established that the candidate keyword should occur in at least 5% of texts in the study corpus. This thus determined the minimum frequency of each term, which varied from one corpus to another. The minimum frequency was 577 instances in the National Corpus and 54 in the Scottish Corpus. In terms of statistical tests, we combined the log-likelihood test (a statistical measure of confidence) with log-ratio as the effect size measure, using the following threshold: a critical value higher than 15.13 (p < 0.001) for the log-likelihood test and 1.5 as the minimum log-ratio score, discarding negative scores. Keywords were then grouped by theme through close reading of the concordance lines, that is, individual occurrences of each word with the preceding and following stretches of text.

To answer question 1.c, we carried out a ‘collocation’ analysis of the terms ‘face mask(s)’ and ‘face covering(s)’. Collocation analyses explore co-occurrence relationships between words, and therefore make it possible to study the narratives or discourses that a word is part of. A word collocates with another if it is more likely to be found in close proximity to the other word than elsewhere. Collocations were generated by means of the software package LancsBox, on the basis of the criteria below:

  • Span of 5:5 – a window of five words to the left and five words to the right of the search word.
  • Mutual Information (MI) score ≥ 6. MI is a statistical procedure widely employed in corpus studies to indicate how strong the association between two words is. It is calculated by considering their frequency of co-occurrence in relation to their frequencies when occurring independently in each corpus.
  • Minimum frequency of collocation: 10 occurrences per 1,000 instances of term in question. For example, ‘face mask(s)’ occurs 1,672 times in the Welsh corpus; the minimum frequency of collocation was therefore 17 instances.

Similar to the analysis of keywords, collocations were analysed by close reading of their concordance lines.

6 Findings

Finding 1 – Over time, ‘face covering(s)’ became more frequent than ‘face mask(s)’ in the Scottish press, but not in national press.

Figures 5-6 show the frequency distribution of the terms ‘face mask(s)’ and ‘face covering(s)’ in the two corpora across time, considering the relative frequencies of terms (per 100,000 words). Note that the scale varies from one chart to another; that is due to differences in the amount of data from each corpus.


Figure 5: Relative frequencies of ‘face covering(s)’ and ‘face mask(s)’ in the National Corpus

Figure 6: Relative frequencies of ‘face covering(s)’ and ‘face mask(s)’ in the Scottish Corpus

As can be seen, both corpora show a clear preference for the term ‘face mask(s)’ in the early months, from December 2019 to March 2020, with hardly any mention of the term ‘face covering(s)’. Scottish newspapers seem to have embraced the term first, with mentions of ‘face covering(s)’ increasing swiftly in April 2020, corresponding to nearly half of the number of mentions of ‘face mask(s)’ in that month (83 as compared with 181 instances). National newspapers showed a modest increase in the mentions of ‘face covering(s)’ in April; the term ‘face mask(s)’ was nearly six times more frequent than ‘face covering(s)’ in the national newspapers (2,241 in relation to 386 instances). Mentions of ‘face covering(s)’ continued to rise across both corpora in the following months. In May, they represented about half of the number of mentions of ‘face mask(s)’ in the Scottish corpus and about a third in the National Corpus. By June, mentions of ‘face covering(s)’ surpassed those of ‘face mask(s)’ in Scottish newspapers. In national newspapers, ‘face mask(s)’ remained more frequent than ‘face covering(s)’ across the entire period.

Finding 2 – ‘Face covering(s)’ are mentioned much more often, relatively speaking, in the Scottish press than in the national press, alongside other terms for public indoor environments where they may be worn.

The words ‘covering’ and ‘coverings’, which tend to occur in the phrase ‘face covering(s)’, were found to be ‘key’ or ‘overused’ in the Scottish as compared with the National Corpus. In other words, ‘covering’ and ‘coverings’ are used much more often, in terms of relative frequencies, in the Scottish Corpus than in the National Corpus, based on our thresholds for effect size (log-ratio) and statistical significance (log-likelihood). However, based on the same thresholds, the word ‘mask(s)’ is not overused in the National corpus as compared with the Scottish Corpus. This means that ‘covering(s)’ in the Scottish Corpus is not in complementary distribution to ‘mask(s)’ in the National Corpus.

Overall, the keyword calculation retrieved 41 overused items in the Scottish Corpus, using the National corpus as reference in both. Table 3 includes the complete lists of keywords in the Scottish Corpus, grouped thematically and then ordered by their frequency of occurrence in the corpus.

Table 3 shows that the keywords in the Scottish Corpus include three other terms that are related to face coverings (‘mandatory’, ‘worn’ and ‘mouth’) as well as groups of words that relate to the different environments where face coverings may or may not be recommended or mandatory: Space (e.g. ‘indoor’, ‘outdoor’, ‘household’), Retail/hospitality (e.g. ‘shop’, ‘hospitality’) and Education (e.g. ‘pupils’, ‘teachers’).

Table 3: ‘Keywords’ in the Scottish Corpus, grouped by theme

The overuse of the word ‘kids’ reflects discussions about the age at which face masks/coverings should be made compulsory, as expressed by a reader’s comment published by The Glasgow Evening Times (Extract 1):

(1) “I AM so confused myself. Our kids are going with no distancing and in shops and malls and cinemas and public transport and airports. There is this hype of distancing. Which one is right? Are the poor kids so strong that they will not catch it at all and will not bring anything back home to their elderly grans etc? So illogical!” (The Glasgow Evening Times, 21.08.2020)

  • The keywords also include a group that is to do with Other Measures to reduce contagion, particularly in public spaces such as shops, restaurants and pubs (e.g. ‘screens’, ‘two-metre’). This is because face coverings are often presented as necessary when those other measures are not practicable:

(2) The government guidance says: “If you can, wear a face covering in an enclosed space where social distancing isn’t possible and where you will come into contact with people you do not normally meet. (The National, 25.06.2020).

Finding 3 – Face ‘mask(s)’ and ‘covering(s)’ have partly different collocates, reflecting differences in status and associated narratives.

We now examine the collocates of ‘face mask(s)’ and ‘face covering(s)’ in the two corpora. These are listed in Tables 4 and 5, in decreasing order of frequency of co-occurrence with each term.


Table 4: Collocations of ‘face mask(s)’ and ‘face covering(s)’ in the Scottish Corpus

Table 5: Collocations of ‘face mask(s)’ and ‘face covering(s)’ in the National Corpus

Five words appeared as collocates of both ‘face mask(s)’ and ‘face covering(s)’ in both corpora. These are: three different forms of the verb ‘wear’ (‘wear’, ‘wearing’, ‘worn’), ‘compulsory’ and ‘mandatory’. These suggest that ‘mask(s)’ and ‘covering(s)’ are both used in the context of debates and decisions about the need or obligation to wear them in certain settings.

Figure 7: Instances of ‘face mask(s)’ in the Scottish Corpus

However, the collocates that only apply to ‘face mask(s)’ show that they tend to be talked about as a type of PPE in clinical or care settings (e.g. ‘protective’, ‘surgical’, ‘gloves’, ‘aprons’).

(3) Carers, many of whom are paid low wages by private sector firms, have complained they have not been provided with essential items such as hand sanitiser, gloves, aprons, and face masks. (The Independent, 24.03.2020)

In contrast, the collocates that only apply to ‘covering(s)’ show that they tend to be talked about as a non-medical item of clothing that is:

  • made of cloth and a potential fashion accessory or political statement (‘cloth’, ‘branded’);

(4) Currently no other party is selling branded face coverings, although many independent online shops stock masks with Union flag or political designs. (The National, 25.07.2020)

(5) Face coverings include scarves, a piece of cloth or a mask and certain travellers – such as people with disabilities or breathing difficulties – will be exempt. (The Daily Express, 06.06.2020)

  • recommended to be worn (e.g. ‘recommended’, ‘advised’);

(6) Earlier this week, First Minister Nicola Sturgeon recommended the limited use of face coverings – not necessarily masks – when social distancing is hard to maintain. (Glasgow Evening Times, 04.05.2020)

(7) Other precautions advised include wearing face coverings in public as much as possible, keeping two metres apart, avoiding physical contact with those outside one’s household and to be tested and isolate if told to do so. (The Telegraph 18.07.2020)

  • in specific indoor public settings (‘crowded’, ‘enclosed’; ‘shops’, ‘transport’);

(8) “However, we are recommending you do wear a cloth face covering if you are in an enclosed space with others where social distancing is difficult – for example, on public transport, or in a shop.” (The National, 28.04.2020)

(9) It is compulsory to wear face coverings on public transport, in shops and when collecting takeaway food. (The Sun, 14.08.2020)

  • by large sections of the population (‘secondary’, ‘pupils’, ‘passengers’).

(10) A SECONDARY school is asking pupils to wear face coverings as part of efforts to combat the spread of coronavirus. (The Herald, 23.08.2020)

(11) Passengers have been told to wear face coverings on public transport to prevent a further outbreak of coronavirus as Britain slowly emerges from the lockdown. (The Times, 12.05.2020)

What does not, however, emerge from the collocates of ‘face covering(s)’ in either corpus is a consistent message about their role in protecting others from droplets produced by the wearer, thus reducing transmission overall. This may partly explain ongoing opposition to or scepticism about the usefulness of face coverings during the pandemic.

7 Conclusions

Overall, in the period December 2019 – August 2020, reports on face mask(s)/covering(s) in the Scottish press contrasted with the national press in terms of: a preference for ‘face covering(s)’ over ‘face mask(s)’ from April 2020 onwards; and a greater concern for their use to mitigate the transmission of the virus in schools, shops and other public indoor environments. This can only be partly explained by the fact that the Westminster government made decisions about the recommended/mandatory use of face coverings in public indoor spaces slightly later than the Scottish devolved administration. The contrasting collocates of ‘face covering(s)’ vs. ‘face mask(s)’ confirm that they are associated with different settings and narratives: PPE in clinical/care settings vs. item of clothing/accessory to be worn in public indoor environments by the general population as a public health measure. In the period under consideration, the latter narrative was therefore increasingly prevalent in Scottish but not in national newspapers.

Introductory Blog – Hanna Schmueck

I am very honoured to have received the Geoffrey Leech Outstanding MA Student Award for my MA in Language and Linguistics. This award traditionally goes to the MA student with the highest overall average.

I started my postgraduate journey in September 2019 after finishing my undergraduate degree at the University of Bamberg (Germany) in 2018 and working as a freelance translator and teacher for a year. I’ve always had an interest in the way language influences us both as individuals and as a society and have carried with me a fascination for experimentation and statistics. I first discovered corpus linguistics in the second year of my undergraduate degree, it soon after cemented itself as my primary research interest. I chose a corpus-based project for my undergraduate dissertation on pronouns in the English-lexifier lingua franca Bislama. From here I realised that much of the relevant methodological literature had been published by Lancaster academics – which cemented my decision to apply at Lancaster despite having to move abroad and face a number of Brexit-related administrative hurdles.

When I finally came to Lancaster for my MA, I felt welcome in the department from day one and I had the chance to attend/audit a wide variety of modules such as Cognitive Linguistics, Experimental Approaches to Language and Cognition, Forensic Linguistics, Stylistics, and Corpus Linguistics. The freedom of choice that Lancaster MA students in Language and Linguistics are given was another major motivation for studying at Lancaster and the flexible approach really benefited my personal learning experience. Another important element of my academic learning experience was being able to attend research groups – such as the Trinity group and UCREL talks –which focus on a wide variety of topics and allow you to come into contact with people that have all kinds of specialisms while getting the opportunity to develop your own research interests further.

I had, like all of us, not foreseen that my MA would move online in spring and all the challenges COVID-19 would bring about, but after the first phase of getting used to the situation I tried my best to see this as an opportunity to focus on my MA thesis titled “More than the sum of its parts: Collocation networks in the written section of the BNC2014 Baby+”. The aim of this thesis was to explore corpus-wide collocation networks and their structural and graph-theoretical properties using the BNC2014 Baby+ as the underlying dataset. I developed a method to create and display large MI2-score based weighted networks in order to analyse meta-level collocational patterns that emerge and performed a graph-theoretical analysis on them. The results obtained from this pilot study suggested that there is an underlying structure that all sections in the BNC2014 Baby+ share and the structure of the generated networks resembles other networks from a wide variety of phenomena such as power grids, social networks, and networks of brain neurons. The findings indicated that there are, however, text-type specific differences in terms of how connected different topic areas are and that certain words serve as hubs connecting topics with one another. The network displayed below is an example taken from the BNC Baby+ academic books section with a filter applied to only show the node “award”, its direct neighbours and their weighted interrelations.

I am very grateful for having had the opportunity to learn from and exchange ideas with so many amazing academics in the department over the course of my MA and I’m very excited to carry on researching collocation networks for my PhD here at Lancaster.

ICR Outstanding Corpus Thesis Award for Lancaster PhD graduate

I am honoured to have received the Institute for Corpus Research Outstanding Doctoral Thesis Award. The purpose of this annual award is to recognise and reward theses in the field of Corpus Linguistics.

I conducted my PhD research in the Centre for Corpus Approaches to Social Science at Lancaster University, which is part of Department of Linguistics and English Language. My thesis was titled Collocational Processing in Typologically Different Languages, English and Turkish: Evidence from Corpora and Psycholinguistic Experimentation. Some of the findings based on my PhD research are reported in this article. The study was multidisciplinary, involving both corpus analysis and psycholinguistic experimentation. Supervisors Dr Vaclav Brezina and Prof Patrick Rebuschat played a key role in shaping the thesis. Their academic knowledge and insight have been invaluable in developing a multidisciplinary perspective to pioneer a contrastive study of English and Turkish.

Turkish, with its rich morphology, differs from English – prompting questions about whether the same variables affect collocational processing in the two languages. Importantly, so far the vast majority of research on collocational processing has focussed on a narrow range of primarily European languages, especially English, which makes it difficult to generalise the findings to other languages. Corpus analyses showed that uninflected collocations have similar mean frequencies and association counts in both languages. When inflected forms were included, 75% of the Turkish collocations occurred at a higher frequency than the collocations in English, suggesting that language typology impacts frequency of collocations.

I then conducted psycholinguistic experiments to understand the differences and similarities between the processing of collocations in English and Turkish and by native and non-native English speakers. To what extent is there a difference between native-speakers’ (of English and Turkish) sensitivity to both individual word-level and phrase level frequency information when processing collocations? Mixed-effects regression modelling revealed that Turkish and English native-speakers are equally sensitive to collocation frequencies, confirming collocations’ psychological reality in both languages. Yet English speakers were additionally affected by individual word-frequencies, indicating that language typologies require users to process collocations from different sources of information.

Furthermore, this thesis investigated the effects of individual word and collocational frequency on native and non-native speakers’ collocational processing in English. Both groups of participants demonstrated sensitivity to individual word and collocation frequency. The findings align with the predictions of usage-based approaches that language acquisition should be viewed as a statistical accumulation of experiences that changes every time we encounter a particular utterance.

This study identified both universal fundamentals and language-specific differences in collocational processing. It addressed language typology and second-language learning through a novel multidisciplinary approach which reinforces and challenges usage-based theories of language learning, demonstrating that they should include typologically different languages to develop broader perspectives on processing.

Please see the link here for more information about this award.

If you have any questions, or are interested in working with me, get in touch. Dr Doğuş Can Öksüz Research fellow at the University of Leeds. d.oksuz@leeds.ac.uk

Questioning Vaccination Discourse (Quo VaDis): A Corpus-based Study

A new three-year project based in CASS will use corpus linguistic methods to study how vaccinations (including future vaccines for Covid-19) are talked about in the UK press, UK parliamentary discourse and social media. Through collaborations with governmental and public health partners, the findings will be used to help address vaccine hesitancy, which is one of the World Health Organizations top 10 global health challenges.

The project will start in March 2021 and is funded by the Economic and Social Research Council, part of UK Research and Innovation.

To find out more, read Lancaster University’s announcement and watch a brief introduction to the project by Principal Investigator Elena Semino.

English language assessment and training for medical professionals

Proficiency in English is crucial for effective and appropriate medical communication and U.K. regulating bodies for nurse and doctor practitioners use standardised tests (such as IELTS, OET, TOEFL) to assess English proficiency of non-UK/EU applicants.

The aim of this project is to investigate a corpus of authentic clinical interactions to identify patterns of interaction and language used by health professionals and as such, determine how well the English tests taken by applicants reflect English as used in ‘real life’ encounters. Our investigation will help us to identify the key communication skills required to deliver effective clinical care and allow us to support industrial partners with specific recommendations for language assessment and training for healthcare staff.

With a broad focus on the various participant roles within the patient journey through Emergency Departments, we are investigating how the language used by patients, nurses, doctors and other hospital staff reflects their various responsibilities and status. Specifically, we focus on the following aspects of language: –

Questions: which participants ask questions throughout the encounter? How are they phrased and to what do they refer? How do health professionals check understanding?

Directives: how do health professionals issue instructions? What types of mitigation or hedging are used?

Openings: how do the participants introduce themselves and establish their roles? Do health professionals use names/titles?

Pronouns: how do participants establish and maintain individual/collective identities through the use of pronouns?

Small talk: how and when do health professionals engage in small talk with patients? Or with other health professionals?

Empathy: how do we evidence expressions of empathy in the data? What kinds of empathy phrases do we observe and does this differ according to role?

Our approach is designed to identify those recurring interactional features of Emergency Department encounters that can help inform the teaching and assessment procedures that prepare candidates for the ‘real world’ of healthcare communication.

Team

Dr Dana Gablasova (https://www.lancaster.ac.uk/linguistics/about/people/dana-gablasova) (Lead Investigator)

Dr Luke Collins (https://www.lancaster.ac.uk/linguistics/about/people/luke-collins) (Senior Research Associate)

Dr Vaclav Brezina (https://www.lancaster.ac.uk/linguistics/about/people/vaclav-brezina) (Co-Investigator)

Dr John Pill (https://www.lancaster.ac.uk/linguistics/about/people/john-pill) (Co-Investigator)

CASS joins International Consortium for Communication in Health Care

We are delighted to announce that we are joining the International Consortium for Communication in Health Care. The Consortium is led by the Australian National University, and also includes University College London, Nanyang Technological University, the University of Hong Kong and Queensland University of Technology. The aim of the Consortium is to conduct research that increases understanding of communication about illness in clinical and non-clinical settings, and to translate research findings into changes in education and practice that will improve the experiences and safety of patients.

You can read more about the launch of the Consortium here: https://www.lancaster.ac.uk/news/lancaster-university-linguists-will-help-improve-global-healthcare-communication

And here is a video introducing the Consortium’s aims and partners: https://vimeo.com/459184068/155158b30b

 

PhD conference prize for James Balfour

I am honoured to have received the award for “student research with the most potential for impact” at the 2020 Corpora and Discourse International conference this year. The award, which included a prize of £150, was sponsored by Palgrave and decided through nominations from conference attendees. The talk can be accessed online here: https://corporadiscourse.com/healthcare-representations-videos/

The findings are part of a larger project examining how people with schizophrenia are portrayed in British newspapers. While symptoms of schizophrenia, which include auditory and visual hallucinations (e.g. ‘hearing voices’), affect roughly 1 in 100 people, most of the general public obtain all their understanding of the disorder from the media. This is because most people are unlikely to have first-hand experiences of people with the disorder. After all, people with mental illnesses are not visually identifiable (in the way people with some physical disabilities are) and, as a consequence, no one could identify someone with schizophrenia from just the way they look. Moreover, another symptom people with schizophrenia often experience is social withdrawal, which means that some people diagnosed with the disorder may avoid contact with others. It is therefore crucial that the media provides accurate and tolerant portrayals of people with schizophrenia so that the general public can understand the disorder and are encouraged to treat people with the disorder with respect and compassion.

My talk focussed on differences between the language used to represent people with schizophrenia in the British tabloids and broadsheets. To compare differences, I identified words which were significantly more frequent in either dataset relative to the other. This identified words which were distinctive to either the tabloids or broadsheets reportage on schizophrenia. What I found was that these words converged around a distinctive topic for each dataset. In the tabloids, this distinctive topic was “crime”. These words referred to the sentencing and imprisonment of criminals with a diagnosis of schizophrenia, or the risks posed deinstitutionalising patients from hospital. Hence, a distinctive feature of the reporting in the tabloids was the tendency to represent people with schizophrenia as dangerous. For instance, the following example from The Star reports a story where a patient with schizophrenia has been deinstitutionalised.

BLUNDERS FREED KILLER FROM MENTAL HOSPITAL. A SERIES of errors left a crazed killer at large to stab to death a hero detective. (The Star, 20 May 2005).

Notice how the individual is referred to twice as a killer, which reduces the identity of the individual to their crime. Other aspects of the patient’s identity and their circumstances – noticeably the patient’s own mental distress – are left unmentioned. Instead, he is referred to as crazed, which is a simplistic and dismissive representation of his mental health issues. The fact that he is represented as a killer even before reference is made to his crime even seems to mislead us into thinking that he had committed murder before being deinstitutionalised, which was not the case.

Instead, a distinctive topic in the broadsheets was “art and culture”. These words occur in stories in where a link is posited between psychosis and creativity. For instance, in the following excerpt from The Telegraph, the Victorian painter Richard Dadd is praised for offering in his work a representation of what the journalist paradoxically calls a genuine manic fantasy, which is contrasted with the lets-pretend of contemporary artists who do not experience psychosis. This suggests that Dadd’s paintings are more valuable than those of other artists because they provide insights into an unusual sensory experience.

Oberon and Titania (1854/58) by the schizophrenic Richard Dadd offers genuine manic fantasy, as opposed to the tiresome let’s-pretend of so much of the art of his contemporaries. (The Telegraph, 21 September 2003).

Linking schizophrenia with creativity is part of a much broader stereotype whereby people with health issues are viewed as gifted in a particular enterprise. A more obvious example of this is the stereotypical association between autism and mathematical prowess. These representations, while being more positive, may lead people to have expectations of people with schizophrenia than they are likely unable to or do not want to meet.

A distinctive feature of both tabloids and broadsheets, therefore, is to represent people with schizophrenia as different – as undesirably different and desirably different, respectively. However, most people with schizophrenia are neither dangerous criminals nor talented artists but normal people trying to live their lives. While it is understandable that the press likes to report on unusual people, this inevitably leads to a distorted picture of people with schizophrenia which exacerbates misunderstandings and prejudiced beliefs. Life can be difficult for people diagnosed with schizophrenia already without them having to face the additional problem of being burdened with inaccurate expectations reproduced by the media. Using findings like these, the project looks to working closely with journalists and charities to make the language we use around schizophrenia more accurate and tolerant.

For more details about this project see: http://cass.lancs.ac.uk/author/james-balfour/ If you have any questions or observations, please contact me via j.balfour@lancaster.ac.uk

PhD conference prize for Mark Wilkinson

At the 2020 Corpora and Discourse International Conference, I was very honoured to receive an award for the conference paper “showing the greatest methodological innovation or reflexivity by a student researcher”. The award was sponsored by the Applied Corpus Linguistics journal and included a prize of £250. This year’s online conference, hosted by the University of Sussex, featured a wide variety of brilliant research from students around the world. That I was nominated for the award makes me truly humble and I am especially grateful to my supervisor, Professor Paul Baker, for all his support and guidance during my doctoral research. I would like to take this opportunity to share with you a summary of my talk which is titled: “Black or gay or Jewish or whatever”: A diachronic corpus-based discourse analysis of how the UK’s LGBTQI population came to be represented as secular, cisgender, gay, white and male (available to watch here: https://corporadiscourse.com/language-gender-sexuality-videos/).

This talk emerged from my PhD research in which I aim to map how The Times has used language to discursively construct LGBTQI identities in the UK over the past 60 years. I’m particularly interested in the histories of identity and this is why I’ve chosen to take a diachronic approach, collecting many decades of language data from one of the UK’s most influential broadsheets. This focus on history is based on the assumption adapted from post-structuralist discourse theory (Laclau & Mouffe 1985) that all identities are partially the result of consistent choices in representation made over a sustained period of time.

In order to garner a sense of which discourses have been consistent, I decided to look at both consistent keywords and consistent collocates. This revealed several currents running through the corpus. First, in spite of the fact that the search terms used to build the corpus reflected the inherent diversity within the LGBTQI population, the majority of key terms pertained to gay men. This indicates that the history of queer representation in The Times is primarily their history while the histories of lesbian, bisexual, trans and gender non-conforming people have been largely erased or obfuscated. Second, an analysis of consistent collocates for the word gay showed that additional identifications such as Black and Jewish were statistically significant from the 1980s onward. A closer analysis of the newspaper articles that featured this usage showed that such terms were used in one of two ways. First, Black and Jewish were often used as marked terms which implied that such intersecting identities were exceptional. I would therefore argue that this markedness implies the presumed whiteness and non-Jewishness of the archetypal gay man as presumed by The Times. Secondly, the terms Black and gay as well as Jewish and gay were often presented as mutually exclusive categories. In other words, individuals were represented as being either black or gay, but never both. Cumulatively, it was argued that the history of LGBTQI representation in The Times suggests that through consistent choices in representation over a sustained period of time, the queer population of the UK came to be represented as secular, cisgender, gay, white and male. But, as there was never any use of the term white, how could I make this claim?

Drawing on the intellectual tradition of critical race theory (Baldwin 1963; Crenshaw 1990; Morrison 1992; Hall 1997), I argued that ‘race’ – while certainly a lived experience with material consequences – is not simply a neutral taxonomy of phenotypical differences between people, but is rather an ideological construct that functions as a structuring force in society such that certain bodies are given more value than others. Within this racialised matrix, whiteness is not only privileged, but is passed off as neutral and universal – an unmarked category that functions largely by ‘erasing its own tracks’ (Trechter and Bucholtz 2001:10). From a linguistic perspective then, whiteness functions ‘much like a linguistic sign, taking its meaning from those surrounding categories to which it is structurally opposed’ (Trechter and Bucholtz 2001:5). Therefore, in the data from The Times, the racialisation of gay men as Black, necessarily implies that the whiteness of all other gay men is indeed the implied universal.

In conclusion, it was argued that these cumulative processes are not benign, but rather indicate how the power of language can erase entire groups of people from popular discourse. Furthermore, the combination of corpus data with theories from both within and beyond linguistics is essential in mapping the discursive construction and representation of identities.

References:

Baldwin, J. (1963). The Fire Next Time. New York: Dial Press.

Crenshaw, K., (1990). Mapping the margins: Intersectionality, identity politics, and violence against women of color. Stanford Law Review43, p.1241.

Hall, S. (1997). ‘The spectacle of ‘the other’’. In Hall, S. (ed) (1997) Representation: cultural representations and signifying practices. London: Sage.

Laclau, E. and Mouffe, C. (1985). Hegemony and socialist strategy: Towards a radical democratic politics. London: Verso.

Morrison, T. (1992). Playing in the Dark: Whiteness and the literary imagination. Cambridge: Harvard University Press.

Trechter, S. and Bucholtz, M. (2001). ‘Introduction: White noise: Bringing language into whiteness studies’. Journal of Linguistic Anthropology, 11(1), pp.3-21.