Introductory Blog – Hanna Schmueck

I am very honoured to have received the Geoffrey Leech Outstanding MA Student Award for my MA in Language and Linguistics. This award traditionally goes to the MA student with the highest overall average.

I started my postgraduate journey in September 2019 after finishing my undergraduate degree at the University of Bamberg (Germany) in 2018 and working as a freelance translator and teacher for a year. I’ve always had an interest in the way language influences us both as individuals and as a society and have carried with me a fascination for experimentation and statistics. I first discovered corpus linguistics in the second year of my undergraduate degree, it soon after cemented itself as my primary research interest. I chose a corpus-based project for my undergraduate dissertation on pronouns in the English-lexifier lingua franca Bislama. From here I realised that much of the relevant methodological literature had been published by Lancaster academics – which cemented my decision to apply at Lancaster despite having to move abroad and face a number of Brexit-related administrative hurdles.

When I finally came to Lancaster for my MA, I felt welcome in the department from day one and I had the chance to attend/audit a wide variety of modules such as Cognitive Linguistics, Experimental Approaches to Language and Cognition, Forensic Linguistics, Stylistics, and Corpus Linguistics. The freedom of choice that Lancaster MA students in Language and Linguistics are given was another major motivation for studying at Lancaster and the flexible approach really benefited my personal learning experience. Another important element of my academic learning experience was being able to attend research groups – such as the Trinity group and UCREL talks –which focus on a wide variety of topics and allow you to come into contact with people that have all kinds of specialisms while getting the opportunity to develop your own research interests further.

I had, like all of us, not foreseen that my MA would move online in spring and all the challenges COVID-19 would bring about, but after the first phase of getting used to the situation I tried my best to see this as an opportunity to focus on my MA thesis titled “More than the sum of its parts: Collocation networks in the written section of the BNC2014 Baby+”. The aim of this thesis was to explore corpus-wide collocation networks and their structural and graph-theoretical properties using the BNC2014 Baby+ as the underlying dataset. I developed a method to create and display large MI2-score based weighted networks in order to analyse meta-level collocational patterns that emerge and performed a graph-theoretical analysis on them. The results obtained from this pilot study suggested that there is an underlying structure that all sections in the BNC2014 Baby+ share and the structure of the generated networks resembles other networks from a wide variety of phenomena such as power grids, social networks, and networks of brain neurons. The findings indicated that there are, however, text-type specific differences in terms of how connected different topic areas are and that certain words serve as hubs connecting topics with one another. The network displayed below is an example taken from the BNC Baby+ academic books section with a filter applied to only show the node “award”, its direct neighbours and their weighted interrelations.

I am very grateful for having had the opportunity to learn from and exchange ideas with so many amazing academics in the department over the course of my MA and I’m very excited to carry on researching collocation networks for my PhD here at Lancaster.

Representations of Obesity in the News: Project update and book announcement!

Gavin Brookes and Paul Baker

We are delighted to announce the forthcoming publication of a book based on research carried out as part of the CASS project, ‘Representations of Obesity in the News’. The book, titled Obesity in the News: Language and Representation in the Press, will be published by Cambridge University Press in 2021. You can see a sneak preview of the cover here!

The book reports analysis of a 36 million-word corpus of all UK national newspaper articles mentioning obese or obesity published over a ten-year period (2008-2017). This analysis combines methods from Corpus Linguistics with Critical Discourse Studies to explore the discourses that characterise press coverage of obesity during this period. The book explores a wide range of themes in this large dataset, with chapters that answer the following questions:

• What discourses characterise representations of obesity in the press as a whole?

• How do obesity discourses differ according to newspapers’ formats and political leanings?

• How have obesity discourses changed over time, and how do they interact with the annual news cycle?

• How does the press use language to shame and stigmatise people with obesity, and how are attempts to ‘reclaim’ the notion of obesity depicted?

• What discourses surround the core concepts of the ‘healthy body’, ‘diet’ and ‘exercise’ in press coverage of obesity?

• How do obesity discourses interact with gender, and how does this influence the ways in which men and women with obesity are represented?

• How does the press talk about social class in relation to obesity, and how do such discourses contribute to differing depictions of obesity in people from different social class groups?

• Finally, how do audiences respond to press depictions of obesity in below-the-line comments on online articles?

The book will be the latest output from this project. You can read more about our work on changing representations of obesity over time in this recent Open Access article published in Social Science & Medicine. We are also working on articles which expand our analysis of obesity and social class, depictions of obesity risk, and obesity discourses in press coverage of the coronavirus pandemic, so keep your eyes peeled for further announcements!

English language assessment and training for medical professionals

Proficiency in English is crucial for effective and appropriate medical communication and U.K. regulating bodies for nurse and doctor practitioners use standardised tests (such as IELTS, OET, TOEFL) to assess English proficiency of non-UK/EU applicants.

The aim of this project is to investigate a corpus of authentic clinical interactions to identify patterns of interaction and language used by health professionals and as such, determine how well the English tests taken by applicants reflect English as used in ‘real life’ encounters. Our investigation will help us to identify the key communication skills required to deliver effective clinical care and allow us to support industrial partners with specific recommendations for language assessment and training for healthcare staff.

With a broad focus on the various participant roles within the patient journey through Emergency Departments, we are investigating how the language used by patients, nurses, doctors and other hospital staff reflects their various responsibilities and status. Specifically, we focus on the following aspects of language: –

Questions: which participants ask questions throughout the encounter? How are they phrased and to what do they refer? How do health professionals check understanding?

Directives: how do health professionals issue instructions? What types of mitigation or hedging are used?

Openings: how do the participants introduce themselves and establish their roles? Do health professionals use names/titles?

Pronouns: how do participants establish and maintain individual/collective identities through the use of pronouns?

Small talk: how and when do health professionals engage in small talk with patients? Or with other health professionals?

Empathy: how do we evidence expressions of empathy in the data? What kinds of empathy phrases do we observe and does this differ according to role?

Our approach is designed to identify those recurring interactional features of Emergency Department encounters that can help inform the teaching and assessment procedures that prepare candidates for the ‘real world’ of healthcare communication.

Team

Dr Dana Gablasova (https://www.lancaster.ac.uk/linguistics/about/people/dana-gablasova) (Lead Investigator)

Dr Luke Collins (https://www.lancaster.ac.uk/linguistics/about/people/luke-collins) (Senior Research Associate)

Dr Vaclav Brezina (https://www.lancaster.ac.uk/linguistics/about/people/vaclav-brezina) (Co-Investigator)

Dr John Pill (https://www.lancaster.ac.uk/linguistics/about/people/john-pill) (Co-Investigator)

Covid-19 and the International Baccalaureate

A month passed, but yet our pain hasn’t diminished and justice unserved (#ibscandal, Aug 6)

Three months ago, when I wrote my introductory post for the CASS blog, I had a clear research plan for my SSHRC (Canada’s Social Sciences & Humanities Research Council) postdoctoral fellowship, which involved examining IB discourses in a large corpus of global (English) newspapers to see how these compared to IB discourses in Canada. However, that plan took a completely new and unexpected turn last month when Covid-19 and the IB collided.

The word “unprecedented” has been used a great deal in connection with our current Covid-19 world. While readers of history may raise a sceptical eyebrow about exactly how unprecedented this situation is, it does apply to the IB organization and the events unfolding globally in relation to the May 2020 final examination results which were released on July 6. This year has been unlike any other year in the organization’s 52 year history because for the first time, the high stakes IB diploma program final exams were cancelled due to Covid-19. The announcement was made on March 23, further elaborated by the Director General Siva Kumari on March 24, with a follow-up statement on May 13 describing in detail the alternate assessment model to be used.

On July 6, the IB organization published the final results for 174,355 IB Diploma Program (DP) and Career-related Program (CP) students with great fanfare. Of these, 170,343 were DP candidates from 146 countries, whose results would most likely be linked to university admission. Congratulatory messages were splashed on the IB organization website and Twitter feeds, celebrating the triumph of the Class of 2020 for their great achievements in such a difficult year. Messages from the Director General, Siva Kumari, Deputy Director Sally Holloway, Chief Assessment Officer Paula Wilcock and representatives from IB schools around the world joined together in their praise for this cohort, who had been forced to adapt to a new and fluid situation.

But a problem was brewing that was not evident from the IB organization’s celebratory communication. Reports began to emerge from a variety of sources (e.g., Wired, Reuters, TES, Financial Times, Bloomberg) about issues with IB final results, which turned out to be lower than many had expected and thus put students’ university admission and/or scholarships in jeopardy. Within four days of the release of results, an online petition calling for “Justice for May 2020 IB Graduates”, with the hashtag #ibscandal, had already collected 15,000 signatures. Government bodies such as Ofqual (Office of Qualifications and Examinations Regulation) in the UK and the Data Protection Authority in Norway also became involved in seeking clarification regarding the IB organization’s grading system. Despite such wide coverage, the IB website released only a single statement on July 15 about the results, and the @iborganization Twitter account remained largely inactive, with nine posts on July 6 in connection to the results, and the next post on July 20 saying: In response to the enquiries received, we share further clarity regarding our awarding model for the May 2020 session. We are in direct communication with schools, providing support options including a new process to review extraordinary cases. Learn more: bit.ly/3hdiJnC https://mobile.twitter.com/iborganization/status/1285175265027665920

Over the past month, the #ibscandal hashtag has evolved to become a space where not only students, parents and teachers voice their opinions (e.g., the quote at the start), but also where key information is exchanged, such as newly published articles or videos. There are also academics and journalists posting on this site, most recently a professor from New York University looking to talk to students “affected by the 2020 #ibscandal”. And on July 30, there was an announcement saying that the IB controversy was now on Wikipedia. In sum, there is a stark contrast between the IB organization’s silence on anything to do with results on one side, while anger and frustration mount on the other.

Meanwhile, in Canada, no coverage of these events can be found anywhere. This is curious since Canada not only has the second highest number of DP candidates in the world (11,962 reported for the May 2020 examination session) but ranks as the number one “destination for IB transcripts of any university in the world” (Arida, 2016). It would seem reasonable to expect that there might be some interest in the events taking place globally. Just to be sure I wasn’t missing anything, I conducted a search for international AND baccalaureate on Canadian News stream, a database containing over 280 news sources. Of the 17 results for the month of July, three were not about the IB, two mentioned the IB in passing as part of a person’s qualifications, one reported on a school in Vancouver going ahead with its plans to become an IB school, 10 reported on complaints against Canada’s Governor General Julie Payette and her assistant, who had been friends “going back to their days in an international baccalaureate program decades ago”. And one article, from July 21 (over two weeks after the IB results were published), is a reprint of the Reuters story Global exam grading algorithm under fire for suspected bias by A. A. Schapiro. In other words, there is no Canadian news story even though the topic is clearly newsworthy.

So like everyone else who is interested in this topic, I am a regular visitor to the #ibscandal hashtag, observing events unfold in real-time. As a result I’ve noticed some rather interesting developments over the past four weeks which I hope to explore further using corpus tools and methods. As they often say here at CASS, watch this space!

Update: Since this piece was written, the IB organization issued a statement on August 17 explaining changes to their assessment model in light of data and evidence they received from schools. The announcement also appeared on the organization’s Twitter feed (https://mobile.twitter.com/iborganization/status/1295269390540210176).

Time to Celebrate: Trinity Lancaster Corpus

On Wednesday 30 October, The ESRC Centre for Corpus Approaches (CASS) organised a small get-together in its new location, Bailrigg House, to celebrate the research that is being carried out at the centre. Specifically, on this occasion, we wanted to highlight the Trinity Lancaster Corpus, a corpus of spoken learner English built in collaboration between Lancaster University and Trinity College London.

Cutting the cake with the Trinity Lancaster Corpus logo

We are really proud of the corpus, which is the largest learner corpus of its kind. It took us over five years to complete this part of the project. Here are a few numbers that describe the Trinity Lancaster Corpus:

  • Over 2,000 transcripts
  • Over 4.2 million words
  • Over 3,500 hours of transcription time
  • Over 10 L1 and cultural backgrounds
  • Up to four speaking tasks

A balanced sample of the corpus is now available for online searching via TLC Hub (password: Lancaster1964). To read more about the corpus and its development, check out this article in the International Journal of Learner Corpus Research:

Gablasova, D., Brezina, V., & McEnery, T. (2019). The Trinity Lancaster Corpus: Development, Description and ApplicationInternational Journal of Learner Corpus Research5(2), 126-158. [open access]

A new special issue of the journal featuring articles on various aspects of learner language, which use the Trinity Lancaster Corpus as their primary data source, is available from this link.

Table of contents of the special issue of the International Journal of Learner Corpus Research

A cake to celebrate the Trinity Lancaster Corpus

Celebrations at CASS

Celebrations at CASS (posters featuring research on TLC in the background)

CASS is strengthening its links with colleagues at the University of Mosul in Iraq

As reported in the media, in recent months we have been delighted to support staff and students at the University of Mosul in Iraq who are rebuilding the Department of English after the devastation caused by the so-called Islamic State group . Via the CorpusMOOC and other forms of long-distance support, we have begun to interact with colleagues in Mosul, and to appreciate both the size of the task ahead of them and their determination to succeed. We are now in the process of arranging a month-long visit to Lancaster from two Mosul academics, so that we can strengthen our ties, including by exploring joint projects. Watch this space for updates on the visit and our future joint activities.

Introductory Blog – Gavin Brookes

This is the second time I have been a part of CASS, which means that this is the second time I’ve written one of these introductory blog pieces. I first worked in CASS in 2016, on an eight-month project with Paul Baker where we looked at  the feedback that patients gave about the NHS in England. This was a really fun project to work on – I enjoyed being a part of CASS and working with Paul and made some great friends in the Centre with whom I’m still in contact to this day. Since leaving CASS in October 2016, I completed my PhD in Applied Linguistics in the School of English at the University of Nottingham, which examined the ways that people with diabetes and eating disorders construct their illnesses and identities in online support groups. Following my PhD, I stayed in the School of English at Nottingham, working as a Postdoctoral Research Fellow in the School’s Professional Communication research and consultancy unit.

As you might have guessed from the topic of my doctoral project and my previous activity with CASS, my main research interests are in the areas of corpus linguistics and health communication. I am therefore very excited to return to the Centre now, with its new focus on the application of corpora to the study of health communication. I’m currently working on a new project within the Centre, Representations of Obesity in the News, which explores the ways that obesity and people affected by obesity are represented in the media, focussing in particular on news articles and readers’ responses. I’m very excited to be working on this important project. Obesity is a growing and seemingly ever-topical public health concern, not just in the UK but globally. However, the media’s treatment of the issue can often be stigmatising, making it quite deserving of scrutiny! Yet, our aim in this project isn’t just to take the media to task, but to eventually work with media outlets to advise them on how to cover obesity in a way that is more balanced and informative and, crucially, less stigmatising for people who are affected by it. In this project, we’re also working with obesity charities and campaign groups, which provides a great opportunity to make sure that the focus of our research is not just fit for academic journals but is relevant to people affected by this issue and so can be applied in the ‘real world’, as it were.

So, to finish on more of a personal note, the things I said about myself the last time I wrote one of these blog posts  are still true ; I still like walking, I still travel lots, I still read fantasy and science fiction, I still do pub quizzes, my football team are still rubbish and I don’t think I’ve changed that much since the photo used in that piece was taken… Most of all, though, it still excites me to be a part of CASS and I am absolutely delighted to be back.

 

Learn about the BNC2014, scan a book sample and contribute to the corpus…

On Saturday 12 May 2018, CASS hosted a small training event at Lancaster University for a group of participants, who came from different universities in the UK.  We talked about the BNC2014 project and discussed both the theoretical underpinnings as well as the practicalities of corpus design and compilation. Slides from the event are available as pdf here.

The participants then tried in practice what is involved in the compilation of a large general corpus such as the BNC2014. They selected and scanned samples of books from current British fiction, poetry and a range of non-fiction books (history, popular science, hobbies etc.). Once processed, these samples will become a part of the written BNC2014.

Here are some pictures from the event:

Carmen Dayrell and Vaclav Brezina before the event

Elena Semino welcoming participants

In the computer lab: Abi Hawtin helping participants


A box full of books

If you are interested in contributing to the written BNC2014, go to the project website  to find out about different ways in which you can participate in this exciting project.

The event was supported by ESRC grant no. EP/P001559/1.

40th Anniversary of the Language and Computation Group

Mahmoud

Recently I was given the chance to attend the 40th anniversary of the Language and Computation (LAC) group at The University of Essex. As an Essex alumni I was invited to present my work with CASS on Financial Narrative Processing (FNP) part of the ESRC funded project . Slides are available online here.

The event celebrates 40 years of the Language and Computation (LAC) group: an interdisciplinary group created to foster interaction between researchers working on Computational Linguistics within the University of Essex.

There were 16 talks by Essex University alumnus and connections including Yorick Wilks, Patrick Hanks, Stephen Pulman and Anne de Roeck. http://lac.essex.ac.uk/2016-computationallinguistics40

The two day workshop started with Doug Arnold from the Department of Language and Linguistics at Essex. He started by presenting the history and the beginning of the LAC group which started with the arrival of Yorick Wilks in the late 70s and others from Language and Linguistics, this includes Stephen Pulman, Mike Bray, Ray Turner and Anne de Roeck. According to Doug the introduction of the cognitive studies center and the Eurotra project in the 80s led to the introduction of the Computational Linguistics MA paving the way towards the emergence of Language and Computation. Something I always wondered about.

The workshop referred to the beginning of some of the most influential conferences and associations in computational linguistics such as CoLing, EACL and ESSLLI. It also showed the influence of the world events around that period and the struggle researchers and academics had to go through, especially during the cold war and the many university crises around the UK during the 80s and the 90s. Having finished my PhD in 2012 it never crossed my mind how difficult it would have been for researchers and academics to progress under such intriguing situations during that time.

Doug went on to point out how the introduction of the World Wide Web in the mid 90s and the development of technology and computers helped to rapidly advance and reshape the field. This helped in closing the gap between Computation and Linguistics and the problem of field identity between Computational Linguists coming from a Computing or Linguistics background. We now live surrounded by rapid technologies and solid networks infrastructure which makes communications and data processing a problem no more. I was astonished when Stephen Pulman mentioned how they used to wait a few days for the only machine in the department to compile a few lines-of-code of LISP.

The presence of Big Data processing in 2010 and the rapid need for resourcing, crowd-sourcing and interpreting big data added more challenges but interesting opportunities to computational linguists. Something I very much agree with considering the vast amount of data available online these days.

Doug ended his talk by pointing out that in general Computational Linguistics is a difficult field; computational linguists are expected to be experts in many areas, concluding that training computational linguists is deemed to be a challenging and difficult task. As a computational linguist this rings a bell. For example, and as someone from a computing background, I find it difficult to understand how part of speech taggers work without being versed in the grammatical aspect of the language of study.

Doug’s talk was followed by compelling and very informative talks from Yorick Wilks, Mike Rosner and Patrick Hanks.

Yorick opened with “Linguistics is still an interesting topic” narrating his experience in moving from Linguistics towards Computing and the challenge imposed by the UK system compared to other countries such as France, Russia and Italy where Chomsky had little influence. This reminded me of Peter Norivg’s response to Chomsky’s criticism of empirical theory where he said and I quote: “I think Chomsky is wrong to push the needle so far towards theory over facts”.

In his talk, Yorick referred to Lancaster University and the remarkable work by Geoffrey Leech and the build up of the CLAWS tagger, which was one of the earliest statistical taggers to ever reach the USA.

“What is meaning?” was Patrick Hanks talk’s opening and went into discussing word ambiguity saying: “most words are hopelessly ambiguous!”.  Patrick briefly discussed the ‘double helix’ rule system or the Theory of Norms and Exploitations (TNE), which enables creative use of language when speakers and writers make new meanings, while at the same time relying on a core of shared conventions for mutual understanding. His work on pattern and phraseologies is of great interest in an attempt to answer the ”why this perfectly valid English sentence fits in a single pattern?” question.

This was followed by interesting talks from ‘Essexians’ working in different universities and firms across the globe. This included recent work on Computational Linguistics (CL), Natural Language Processing (NLP) and Machine Learning (ML). One of those was a collaboration work between Essex University and Signal– a startup company in London.

The event closed with more socialising, drinks and dinner at a Nepalese restaurant in Colchester, courtesy of the LAC group.

In general I found the event very interesting, well organised and rich in terms of historical evidences on the beginning of Language and Computation. It was also of great interest to know about current work and state-of-the-art in CL, NLP and ML presented by the event attendances.

I would very much like to thank The Language and Computation group at Essex Universities for the invitation and their time and effort organising this wonderful event.

Mahmoud El-Haj

Senior Research Associate

CASS, Lancaster University

@DocElhaj

http://www.lancaster.ac.uk/staff/elhaj