Further Trinity Lancaster Corpus research: Examiner strategies

This month saw a further development in the corpus analyses: the examiners. Let me introduce myself, my name is Cathy Taylor and I’m responsible for examiner training at Trinity and was very pleased to be asked to do some corpus research into the strategies the examiners use when communicating with the test takers.

In the GESE exams the examiner and candidate co-construct the interaction throughout the exam. The examiner doesn’t work from a rigid interlocutor framework provided by Trinity but instead has a flexible test plan which allows them to choose from a variety of questioning and elicitation strategies. They can then respond more meaningfully to the candidate and cover the language requirements and communication skills appropriate for the level. The rationale behind this approach is to reflect as closely as possible what happens in conversations in real life. Another benefit of the flexible framework is that the examiner can use a variety of techniques to probe the extent of the candidate’s competence in English and allow them to demonstrate what they can do with the language. If you’re interested more information can be found in Trinity’s speaking and listening tests: Theoretical background and research.

After some deliberation and very useful tips from the corpus transcriber, Ruth Avon, I decided to concentrate my research on the opening gambit for the conversation task at Grade 6, B1 CEFR. There is a standard rubric the examiner says to introduce the subject area ‘Now we’re going to talk about something different, let’s talk about…learning a foreign language.’  Following this, the examiner uses their test plan to select the most appropriate opening strategy for each candidate. There’s a choice of six subject areas for the conversation task listed for each grade in the Exam information booklet.

Before beginning the conversation examiners have strategies to check that the candidate has understood and to give them thinking time. The approaches below are typical.

  1. E: ‘Let’s talk about learning a foreign language…’
    C: ‘yes’
    E:Do you think English is an easy language?’ 
  1. E: ‘Let ‘s talk about learning a foreign language’
    C: ‘It’s an interesting topic’
    E: ‘Yes uhu do you need a teacher?
  1. It’s very common for the examiner to use pausing strategies which gives thinking time:
    E: ‘Let ‘s talk about learning a foreign language erm why are you learning English?’
    C: ‘Er I ‘m learning English for work erm I ‘m a statistician.’

There are a range of opening strategies for the conversation task:

  • Personal questions: ‘Why are you learning English?’ ‘Why is English important to you?’
  • More general question: ‘How important is it to learn a foreign language these days?’
  • The examiner gives a personal statement to frame the question: ‘I want to learn Chinese (to a Chinese candidate)…what do I have to do to learn Chinese?’
  • The examiner may choose a more discursive statement to start the conversation: ‘Some people say that English is not going to be important in the future and we should learn Chinese (to a Chinese candidate).’
  • The candidate sometimes takes the lead:
  • Examiner: ‘Let’s talk about learning a foreign language’
  • Candidate: ‘Okay, okay I really want to learn a lo = er learn a lot of = foreign languages’

A salient feature of all the interactions is the amount of back channelling the examiners do e.g. ‘erm, mm’  etc. This indicates that the examiner is actively listening to the candidate and encouraging them to continue. For example:

E: ‘Let’s talk about learning a foreign language, if you want to improve your English what is the best way?
C: ‘Well I think that when you see programmes in English’
E: ‘mm
C: ‘without the subtitles’
E: ‘mm’
C: ‘it’s a good way or listening to music in other language
E: ‘mm
C: ‘it’s a good way and and this way I have learned too much

When the corpus was initially discussed it was clear that one of the aims should be to use the findings for our examiner professional development programme.  Using this very small dataset we can develop worksheets which prompt examiners to reflect on their exam techniques using real examples of examiner and candidate interaction.

My research is in its initial stages and the next step is to analyse different strategies and how these validate the exam construct. I’m also interested in examiner strategies at the same transition point at the higher levels, i.e. grade 7 and above, B2, C1 and C2 CEFR. Do the strategies change and if so, how?

It’s been fascinating working with the corpus data and I look forward to doing more in the future.

Continue reading

The heart of the matter …

TLC-LogoHow wonderful it is to get to the inner workings of the creature you helped bring to life! I’ve just spent a week with the wonderful – and superbly helpful – team at CASS devoting time to matters on the Trinity Lancaster Spoken Corpus.

Normally I work from London situated in the very 21st century environment of the web – I plan, discuss and investigate the corpus across the ether with my colleagues in Lancaster. They regularly visit us with updates but the whole ‘system’ – our raison d’etre if you like – sits inside a computer. This, of course, does make for very modern research and allows a much wider circle of access and collaboration. But there is nothing like sitting in the same room as colleagues, especially over the period of a few days, to test ideas, to leap connections and to get the neural pathways really firing.


It’s been a stimulating week not least because we started with the wonderful GraphColl, a new collocation tool which allows the corpus to come to life before our eyes. As the ‘bubbles’ of lexis chase across the screen searching for their partners, they pulse and bounce. Touching one of them lights up more collocations, revealing the mystery of communication. Getting the number right turns out to be critical in producing meaningful data that we can actually read – too loose and we end up with a density we cannot untangle; the less the better seems to be the key.  It did occur to me that finally language had produced something that could contribute to the Science Picture Library https://www.sciencephoto.com/ where GraphColl images could complement the shots of language activity in the brain. I’ve been experimenting with it this week – digging out question words from part of the corpus to find out how patterned they are – more to come.

We’ve also been able to put more flesh on the bones of an important project developed by Vaclav Brezina – how to make the corpus meaningful for teachers (and students). Although we live in an era where the public benefit of science is rightly foregrounded, it can be hard sometimes to ‘translate’ the science and complexity of the supporting technology so that it is of real value to the very people who created the corpus. Vaclav has been preparing a series of extracts of corpus data that can come full circle back into the classroom by showing teachers and their students the way that language works – not in the textbooks but in real ‘lingua franca’ life. In other words, demonstrating the language that successful learners use to communicate in global contexts. This is going to be turned into a series of teaching materials with the quality and relevance being assured by crowdsourcing teaching activities from the teachers themselves.

time Collocates of time in the GESE interactive task

Meanwhile I am impressed by how far the corpus – this big data – is able to support Trinity by helping to build robust validity arguments for the GESE test.  This is critical in helping Trinity’s core audience – our test takers –  to understand why should I do this test, what will the test demonstrate, what effect will it have on my learning, is it fair?  All in all a very productive week.

Trinity Lancaster Spoken Learner Corpus: A milestone to celebrate

On Monday 19 May we came together to celebrate the completion of the first part of the Trinity Lancaster Spoken Learner Corpus project. The transcription of our 2012 dataset is now complete and the corpus comprises 1.5 million running words. The Trinity Lancaster Spoken Learner Corpus represents a balanced sample of learner speech from six different countries (Italy, Spain, Mexico, India, China and Sri Lanka) covering the B1.2 – C2 levels of the Common European Framework (CEFR). Below are some pictures from our small celebration.


trinity1 trinity2

We are continuing with the corpus development adding more data from our 2014 dataset so there is still a lot of work to be done. However, we are really excited about the possibilities of applied linguistic and language testing research based on this unique dataset.

You can read more about the Trinity Lancaster Spoken Learner Corpus in the AEA-Europe newsletter report.