Corpus Linguistics + Language Testing Workshop

‘Corpus-Based Approaches to Language Testing’ was held on Thursday 7th March 2019 and was the second event organised by the Trinity Lancaster Corpus research group at Lancaster University. This event was supported by an ESRC International Networking grant and hosted by the ESRC Centre for Corpus Approaches to Social Science (CASS).

We welcomed 45 people from over 20 different institutions braving the wild Lancastrian weather, traveling from as far as Edinburgh and Bristol to join us. The afternoon started off with a tasty lunch while our eager participants networked with others in the diverse group. From graduate students and researchers to language teachers and item writers, a variety of expertise contributed to the day, all interested in language testing and all keen to learn how corpus linguistics can be applied to the field.

Once the cake was finished, the event transitioned to the first lecture of the afternoon from Dr Dana Gablasova. Giving an overview to corpus-based methods of language analysis before narrowing into how language testing can best use these tools, Dr Gablasova also provided examples from relevant corpora, such as the Trinity Lancaster Corpus that had been transcribed and compiled just metres above the lecture room. Test development, test validation and examiner training were also key topics as Dr Gablasova set the tone for the remainder of the afternoon.

Dr Dana Gablasova discussing the development of teaching materials using corpora.


After a brief pause to reflect, the room was again packed with a rapt audience as our guest speaker, Dr Shelley Staples from the University of Arizona, USA, began her talk. Adding to the previous overview of the day’s focus, Dr Staples took the time to go deeper into one method, multidimensional analysis (MDA) and explained how this can be used in test validation within language testing when considering corpus-based register analysis. She further examined certain linguistic features can be used to distinguish different genres/registers and this was important for many reasons; for example, to understand how test prompts can vary and elicit different language performance. Dr Staples ended by summarising the implications of this kind of analysis for language testing.

Dr Shelley Staples engaging a full house in her lecture on using multidimensional analysis.


After further refreshments of tea, coffee and biscuits, it was time to put theory into practice! A short walk to a computer lab and the participants settled in to hear their third and final talk of the day from Dr Vaclav Brezina. This practical session was based around using #LancsBox: a software package designed to analyse language from new and/or existing corpora with a focus on visualising data and Lancaster Stats Tools Online: a practical support website to conduct a variety of statistical techniques associated with the analysis of corpus data; both are freely-available resources developed at Lancaster University. Dr Brezina began by smoothly transitioning from Dr Staples’ key topic of corpus-based register analysis and explained how the powerful Whelk tool within #LancsBox can be utilised to analyse distribution of words and phrases in different genres/registers of language. Once our participants had successfully mastered this tool, they were directed to Lancaster Stats Tools Online to conduct their MDA with a few easy steps.

The Trinity Lancaster Corpus student researchers were on hand to answer any questions as participants worked their way through the session’s activities.

At the end of the session, Dr Brezina reminded the participants that MDA was as complex as it gets with statistical analyses, but this had been made user friendly thanks to #LancsBox and Lancaster Stats Tools Online. Finally, we had some lovely feedback from participants who found the sessions engaging and thought-provoking. As well as conducting corpus-based register analysis, they were excited at the prospect of using the many other features of #LancsBox including to analyse their own students’ writing. Throughout the afternoon, we also had many hundreds more join us from all over the  world via our Twitter live stream.

We would like to truly thank the speakers, participants and organisers who were all intrinsic in creating such a successful afternoon of theoretical conversations and practical applications. The recordings from the day can be found below. The practical handout can also be found here.

Part I: Lectures

Part II: Workshop