Further Trinity Lancaster Corpus research: Examiner strategies

This month saw a further development in the corpus analyses: the examiners. Let me introduce myself, my name is Cathy Taylor and I’m responsible for examiner training at Trinity and was very pleased to be asked to do some corpus research into the strategies the examiners use when communicating with the test takers.

In the GESE exams the examiner and candidate co-construct the interaction throughout the exam. The examiner doesn’t work from a rigid interlocutor framework provided by Trinity but instead has a flexible test plan which allows them to choose from a variety of questioning and elicitation strategies. They can then respond more meaningfully to the candidate and cover the language requirements and communication skills appropriate for the level. The rationale behind this approach is to reflect as closely as possible what happens in conversations in real life. Another benefit of the flexible framework is that the examiner can use a variety of techniques to probe the extent of the candidate’s competence in English and allow them to demonstrate what they can do with the language. If you’re interested more information can be found in Trinity’s speaking and listening tests: Theoretical background and research.

After some deliberation and very useful tips from the corpus transcriber, Ruth Avon, I decided to concentrate my research on the opening gambit for the conversation task at Grade 6, B1 CEFR. There is a standard rubric the examiner says to introduce the subject area ‘Now we’re going to talk about something different, let’s talk about…learning a foreign language.’  Following this, the examiner uses their test plan to select the most appropriate opening strategy for each candidate. There’s a choice of six subject areas for the conversation task listed for each grade in the Exam information booklet.

Before beginning the conversation examiners have strategies to check that the candidate has understood and to give them thinking time. The approaches below are typical.

  1. E: ‘Let’s talk about learning a foreign language…’
    C: ‘yes’
    E:Do you think English is an easy language?’ 
  1. E: ‘Let ‘s talk about learning a foreign language’
    C: ‘It’s an interesting topic’
    E: ‘Yes uhu do you need a teacher?
  1. It’s very common for the examiner to use pausing strategies which gives thinking time:
    E: ‘Let ‘s talk about learning a foreign language erm why are you learning English?’
    C: ‘Er I ‘m learning English for work erm I ‘m a statistician.’

There are a range of opening strategies for the conversation task:

  • Personal questions: ‘Why are you learning English?’ ‘Why is English important to you?’
  • More general question: ‘How important is it to learn a foreign language these days?’
  • The examiner gives a personal statement to frame the question: ‘I want to learn Chinese (to a Chinese candidate)…what do I have to do to learn Chinese?’
  • The examiner may choose a more discursive statement to start the conversation: ‘Some people say that English is not going to be important in the future and we should learn Chinese (to a Chinese candidate).’
  • The candidate sometimes takes the lead:
  • Examiner: ‘Let’s talk about learning a foreign language’
  • Candidate: ‘Okay, okay I really want to learn a lo = er learn a lot of = foreign languages’

A salient feature of all the interactions is the amount of back channelling the examiners do e.g. ‘erm, mm’  etc. This indicates that the examiner is actively listening to the candidate and encouraging them to continue. For example:

E: ‘Let’s talk about learning a foreign language, if you want to improve your English what is the best way?
C: ‘Well I think that when you see programmes in English’
E: ‘mm
C: ‘without the subtitles’
E: ‘mm’
C: ‘it’s a good way or listening to music in other language
E: ‘mm
C: ‘it’s a good way and and this way I have learned too much

When the corpus was initially discussed it was clear that one of the aims should be to use the findings for our examiner professional development programme.  Using this very small dataset we can develop worksheets which prompt examiners to reflect on their exam techniques using real examples of examiner and candidate interaction.

My research is in its initial stages and the next step is to analyse different strategies and how these validate the exam construct. I’m also interested in examiner strategies at the same transition point at the higher levels, i.e. grade 7 and above, B2, C1 and C2 CEFR. Do the strategies change and if so, how?

It’s been fascinating working with the corpus data and I look forward to doing more in the future.

Continue reading

TLC and innovation in language testing

One of the objectives of Trinity College London investing in the Trinity Lancaster Spoken Corpus has been to share findings with the language assessment community. The corpus allows us to develop an innovative approach to validating test constructs and offers a window into the exam room so we can see how test takers utilise their language skills in managing the series of test tasks.

Recent work by the CASS team in Lancaster has thrown up a variety of features that illustrate how test takers voice their identity in the test, how they manage interaction through a range of strategic competences and how they use epistemic markers to express their point of view and negotiate a relationship with the examiner (for more information see Gablasova et al. 2015). I have spent the last few months disseminating these findings at a range of language testing conferences and have found that the audiences have been fascinated by the findings.

We have presented findings at BAAL TEASIG in Reading, at EAQUALS in Lisbon  and at EALTA in Valencia. Audiences ranged from assessment experts to teacher educators and classroom practitioners and there was great interest both in how the test takers manage the exam as well as the manifestations of L2 language. Each presentation was tailored to the audience and the theme of the conference. In separate presentations, we covered how assessments can inform classroom practice, how the data could inform the type of feedback we give learners and how the data can be used to help validate aspects of the test construct. The feedback has been very positive, urging us to investigate further. Comments have praised the extent and quality of the corpus and range from the fact that the evidence “is something that we have long been waiting for” (Dr Parvaneh Tavakoli, University of Reading) to musings on what some of the data might mean both for how we assess spoken language and the implications for the classroom. It has certainly opened the door on the importance of strategic and pragmatic competences as well as validating Trinity’s aims to allow the test taker to bring themselves into the test.  The excitement spilled over into some great tweets. There is general recognition that the data offers something new – sometimes confirming what we suspected and sometimes – as with all corpora – refuting our beliefs!

We have always recognised that the data is constrained by the semi-formal context of the test but the fact that each test is structured but not scripted and has tasks which represent language pertinent to communicative events in the wider world allows the test taker to produce language which is more reflective of naturally occurring speech than many other oral tests. It has been enormously helpful to have feedback from the audiences who have fully engaged with the issues raised and highlighted aspects we can investigate in greater depth as well as raising features they would like to know more about. These features are precisely those that the research team wishes to explore in order to develop ‘a more fine-grained and comprehensive understanding of spoken pragmatic ability and communicative competence’ (Gablasova et al. 2015: 21)

One of the next steps is to show how this data can be used to develop and support performance descriptors. Trinity is confident that the features of communication which the test takers display are captured in its new Integrated Skills in English exam validating claims that Trinity assesses real world communication.

From Corpus to Classroom 2

There is great delight that the Trinity Lancaster Corpus is providing so much interesting data that can be used to enhance communicative competences in the classroom. From Corpus to Classroom 1 described some of these findings. But how exactly do we go about ‘translating’ this for classroom use so that it can be used by busy teachers with high pressured curricula to get through? How can we be sure we enhance rather than problematize the communicative feature we want to highlight?

Although the Corpus data comes from a spoken test, we want to use it to illustrate  wider pragmatic features of communication. The data fascinates students who are entranced to see what their fellow learners do, but how does it help their learning? The first step is to send the research outputs to an experienced classroom materials author to see what they suggest.

Here’s how our materials writer, Jeanne Perrett, went about this challenging task:

As soon as I saw the research outputs from TLC, I knew that this was something really special; proper, data driven learning on how to be a more successful speaker. I could also see that the corpus scripts, as they were, might look very alien and quirky to most teachers and students. Speaking and listening texts in coursebooks don’t usually include sounds of hesitation, people repeating themselves, people self-correcting or even asking ‘rising intonation’ questions. But all of those things are a big part of how we actually communicate so I wanted to use the original scripts as much as possible. I also thought that learners would be encouraged by seeing that you don’t have to speak in perfectly grammatical sentences, that you can hesitate and you can make some mistakes but still be communicating well.

Trinity College London commissioned me to write a series of short worksheets, each one dealing with one of the main research findings from the Corpus, and intended for use in the classroom to help students prepare for GESE and ISE exams at a B1 or B2 level.

I started each time with extracts from the original scripts from the data. Where I thought that the candidates’ mistakes would hinder the learner’s comprehension (unfinished sentences for example), I edited them slightly (e.g. with punctuation). But these scripts were not there for comprehension exercises; they were there to show students something that they might never have been taught before.

For example, sounds of hesitation: we all know how annoying it is to listen to someone (native and non-native speakers) continually erm-ing and er-ing in their speech and the data showed that candidates were hesitating too much. But we rarely, if ever, teach our students that it is in fact okay and indeed natural to hesitate while we are thinking of what we want to say and how we want to say it. What they need to know is that, like the more successful candidates in the data,  there are other words and phrases that we can use instead of erm and er. So one of the worksheets shows how we can use hedging phrases such as ‘well..’ or ‘like..’ or ‘okay…’ or ‘I mean..’ or ‘you know…’.

The importance of taking responsibility for a conversation was another feature to emerge from the data and again, I felt that these corpus findings were very freeing for students; that taking responsibility doesn’t, of course, mean that you have to speak all the time but that you also have to create opportunities for the other person to speak and that there are specific ways in which you can do that such as making active listening sounds (ah, right, yeah), asking questions, making short comments and suggestions.

Then there is the whole matter of how you ask questions. The corpus findings show that there is far less confusion in a conversation when properly formed questions are used. When someone says ‘You like going to the mountains?’ the question is not as clear as when they say ‘Do you like going to the mountains?’ This might seem obvious but pointing it out, showing that less checking of what has been asked is needed when questions are direct ones, is, I think very helpful to students. It might also be a consolation-all those years of grammar exercises really were worth it! ‘Do you know how to ask a direct question? ‘Yes, I do!’

These worksheets are intended for EFL exam candidates but the more I work on them, the more I think that the Corpus findings could have a far wider reach. How you make sure you have understood what someone is saying, how you can be a supportive listener, how you can make yourself clear, even if you want to be clear about being uncertain; these are all communication skills which everyone needs in any language.



Syntactic structures in the Trinity Lancaster Corpus

We are proud to announce collaboration with Markus Dickinson and Paul Richards from the Department of Linguistics, Indiana University on a project  that will analyse syntactic structures in the Trinity Lancaster Corpus. The focus of the project is to develop a syntactic annotation scheme of spoken learner language and apply this scheme to the Trinity Lancaster Corpus, which is being compiled at Lancaster University in collaboration with Trinity College London. The aim of the project is to provide an annotation layer for the corpus that will allow sophisticated exploration of the morphosyntactic and syntactic structures in learner speech. The project will have an impact on both the theoretical understanding of spoken language production at different proficiency levels as well as on the development of practical NLP solutions for annotation of learner speech.  More specific goals include:

  • Identification of units of spoken production and their automatic recognition.
  • Annotation and visualization of morphosyntactic and syntactic structures in learner speech.
  • Contribution to the development of syntactic complexity measures for learner speech.
  • Description of the syntactic development of spoken learner production.


From Corpus to Classroom 1

The Trinity Lancaster Corpus of Spoken Learner English is providing multiple sets of data that can not only be used for validating the quality of our tests but also – and most importantly – to feedback important features of language that can be utilised in the classroom. It is essential that some of our research is focused on how Trinity informs and supports teachers in improving communicative competences in their learners and this is forming part of an ongoing project the research team are setting up in order to give teachers access to this information.

Trinity has always been focused on communicative approaches to language teaching and the heart of the tests is about communicative competences. The research team are especially excited to see that the data is revealing the many ways in which test takers use these communicative competences to manage their interaction in the spoken tests. It is very pleasing to see that not only does the corpus evidence support claims that the Trinity tests of spoken language are highly interactive but also it establishes some very clear features of effective communicative that can be utilised by teachers in the classroom.

The strategies which test takers use to communicate successfully include:

  • Asking more questions

Here the test taker relies less on declarative sentences to move a conversation forward but asks clear questions (direct and indirect) that are more immediately accessible to the listener.

  • Demonstrating active listenership through backchannelling

This involves offering more support to the conversational partner by using signals such as okay, yes, uhu, oh, etc to demonstrate engaged listenership.

  • Taking responsibility for the conversation through their contributions

Successful test takers help move the conversation along by by creating opportunities with e.g. questions, comments or suggestions that their partner can easily react to.

  • Using fewer hesitation markers

Here the speaker makes sure they keep talking and uses fewer markers such as er, erm which can interrupt fluency.

  • Clarifying what is said to them before they respond

This involves the test taker checking through questions that they have understood exactly what has been said to them.

Trinity is hopeful that these types of communicative strategies can be investigated across the tests and across the various levels in order to extract information which can be fed back into the classroom.  Teachers – and their learners – are interested to see what actually happens when the learner has the opportunity to put their language into practice in a live performance situation. It makes what goes on in the classroom much more real and gives pointers to how a speaker can cope in these situations.

More details about these points can be found on the Trinity corpus website and classroom teaching materials will be uploaded shortly to support teachers in developing these important strategies in their learners.

Also see CASS briefings for more information on successful communication strategies in L2.

The heart of the matter …

TLC-LogoHow wonderful it is to get to the inner workings of the creature you helped bring to life! I’ve just spent a week with the wonderful – and superbly helpful – team at CASS devoting time to matters on the Trinity Lancaster Spoken Corpus.

Normally I work from London situated in the very 21st century environment of the web – I plan, discuss and investigate the corpus across the ether with my colleagues in Lancaster. They regularly visit us with updates but the whole ‘system’ – our raison d’etre if you like – sits inside a computer. This, of course, does make for very modern research and allows a much wider circle of access and collaboration. But there is nothing like sitting in the same room as colleagues, especially over the period of a few days, to test ideas, to leap connections and to get the neural pathways really firing.


It’s been a stimulating week not least because we started with the wonderful GraphColl, a new collocation tool which allows the corpus to come to life before our eyes. As the ‘bubbles’ of lexis chase across the screen searching for their partners, they pulse and bounce. Touching one of them lights up more collocations, revealing the mystery of communication. Getting the number right turns out to be critical in producing meaningful data that we can actually read – too loose and we end up with a density we cannot untangle; the less the better seems to be the key.  It did occur to me that finally language had produced something that could contribute to the Science Picture Library https://www.sciencephoto.com/ where GraphColl images could complement the shots of language activity in the brain. I’ve been experimenting with it this week – digging out question words from part of the corpus to find out how patterned they are – more to come.

We’ve also been able to put more flesh on the bones of an important project developed by Vaclav Brezina – how to make the corpus meaningful for teachers (and students). Although we live in an era where the public benefit of science is rightly foregrounded, it can be hard sometimes to ‘translate’ the science and complexity of the supporting technology so that it is of real value to the very people who created the corpus. Vaclav has been preparing a series of extracts of corpus data that can come full circle back into the classroom by showing teachers and their students the way that language works – not in the textbooks but in real ‘lingua franca’ life. In other words, demonstrating the language that successful learners use to communicate in global contexts. This is going to be turned into a series of teaching materials with the quality and relevance being assured by crowdsourcing teaching activities from the teachers themselves.

time Collocates of time in the GESE interactive task

Meanwhile I am impressed by how far the corpus – this big data – is able to support Trinity by helping to build robust validity arguments for the GESE test.  This is critical in helping Trinity’s core audience – our test takers –  to understand why should I do this test, what will the test demonstrate, what effect will it have on my learning, is it fair?  All in all a very productive week.

New CASS Briefing now available — How to communicate successfully in English?

CASSbriefings-EDLHow to communicate successfully in English? An exploration of the Trinity Lancaster Corpus. Many speakers use English as their non-native language (L2) to communicate in a variety of situations: at school, at work or in other everyday situations. As well as needing to master the grammar and vocabulary of the English language, L2 users of English need to know how to react appropriately in different communicative situations. In linguistics, this aspect of language is studied under the label of “pragmatics”. This briefing offers an exploration of the pragmatic features of L2 speech in the Trinity Lancaster Corpus of spoken L2 production.

New resources are being added regularly to the new CASS: Briefings tab above, so check back soon.

Trinity Lancaster Corpus at the International ESOL Examiner Training Conference 2015

On Friday 30th January 2015, I gave a talk at the International ESOL Examiner Training Conference 2015 in Stafford. Every year, the Trinity College London, CASS’s research partner, organises a large conference for all their examiners which consists of plenary lectures and individual training sessions. This year, I was invited to speak in front of an audience of over 300 examiners about the latest development in the learner corpus project.  For me, this was a great opportunity not only to share some of the exciting results from the early research based on this unique resource, but also to meet the Trinity examiners; many of them have been involved in collecting the data for the corpus. This talk was therefore also an opportunity to thank everyone for their hard work and wonderful support.

It was very reassuring to see the high level of interest in the corpus project among the examiners who have a deep insight into examination process from their everyday professional experience.  The corpus as a body of transcripts from the Trinity spoken tests in some way reflects this rich experience offering an overall holistic picture of the exam and, ultimately, L2 speech in a variety of communicative contexts.

Currently, the Trinity Lancaster Corpus consists of over 2.5 million running words sampling the speech of over 1,200 L2 speakers from eight different L1 and cultural backgrounds. The size itself makes the Trinity Lancaster Corpus the largest corpus of its kind. However, it is not only the size that the corpus has to offer. In cooperation with Trinity (and with great help from the Trinity examiners) we were able to collect detailed background information about each speaker in our 2014 dataset. In addition, the corpus covers a range of proficiency levels (B1– C2 levels of the Common European Framework), which allows us to research spoken language development in a way that has not been previously possible.  The Trinity Lancaster Corpus, which is still being developed with an average growth of 40,000 words a week, is an ambitious project:  Using this robust dataset, we can now start exploring crucial aspects of L2 speech and communicative competence and thus help language learners, teachers and material developers to make the process of L2 learning more efficient and also (hopefully) more enjoyable. Needless to say, without Trinity as a strong research partner and the support from the Trinity examiners this project wouldn’t be possible.

A Journey into Transcription, Part 4: The Question Question

question: (NOUN) A sentence worded or expressed so as to elicit information.

Since we speak in utterances (not sentences), most forms of punctuation are omitted in this corpus of learner language; the exceptions being apostrophes, hyphens and question marks. 

This blog concerns question marks.  (Warning: there are not many jokes!)

When we started transcription, the convention seemed simple and straightforward: Question mark indicates a questionThis is easy to apply when questions are straightforward.  For example, the following question types are easy to identify:  

  • yes/no questions (do you like chocolate?);
  • wh- questions (where have you been?);
  • tag questions(rock music is popular isn’t it?);
  • either/or questions (did you catch the train or did you fly?)

However, very soon, we found ourselves in debate about whether and where to transcribe question marks in less straightforward utterances.  This enabled us to amend the convention and add illustrative examples.  In addition, transcribers created a Questions Bank and began to keep a log of decisions made regarding the transcription of question marks; this was done with the aim of achieving the consistency which we anticipate might be vital to researchers in the future. 

So here follows a reflection on some of the varied ways in which speakers can elicit a response in spoken discourse, along with remarks on whether or not a question mark is transcribed in context of this corpus.

It is useful to keep two vital rules in mind:

  • For the learner language corpus it is the structure of the utterance that is crucial rather than the expression or tone of voice. 
  • If in doubt, leave it out!

Either/Or Adjusted Question

Speaker adjusts wording and question structure remains.

  • so in Indian houses do you also have landline telephones or do they  are they disappearing?

Either/Or Anticipation Question:

Use of ‘or’ suggests a choice of alternatives is going to be presented but the questioner’s voice and pace tails off in anticipation of the listener’s response.

  • do you go to a special school? or… [no ellipsis would not be transcribed in corpus]

Doubled Up Question

Structurally, there may be two questions but only one question is actually being asked; question mark transcribed at the end.

  • is it important to do school trips do you think?

Rephrased / Clarified Question:

Multiple rephrased/related questions in quick succession; each is structurally complete, eliciting a single response.

  • in what area? in what field? do have you any idea?
  • what are you going to do when you finish at this school? what will you do next?

Wondering Question:

A question word (often ‘what’) within the utterance and transcribed with question mark.

  • it seems to me your class sizes you have what? forty five students in a class it seems to me they are very large

Question Word/Context Question:

Question word followed by context/detail; often for emphasis and expressing shock or surprise.

  • what? they have a party all day
  • when? in the middle of the night

Clarification/Qualification Question:

A question followed by qualifying phrase for emphasis or for clarification; question mark may be transcribed at the end…

  • what about education more broadly more generally?
  • would you make it more fashionable more stylish?

…or in the middle of the utterance.

  • what do you think the biggest problems are in Mumbai? the biggest pollution problems
  • is that your ambition? to design a bicycle

Interrupted (Clause) Question:

A clause inserted mid-question but structure remains and one main question is being asked.

  • what about looking at education not just at your school looking at education in general?

Implied Question:

Interrogative intonation communicates speaker’s aim to elicit information; however, in this corpus we focus solely on structure so no question mark is transcribed.

Useful test: is the utterance meaningful without interrogative intonation?  If so, no question mark is added.

S:            I thought I was late

E:            really

S:            yes I overslept


E:            and how are you today?

S:            I’m fine and you

E:            I’m fine too


E:            any questions for me about your topic

S:            yes have you ever been to New York?

Statement Question:

Again, interrogative intonation communicates speaker’s aim to elicit information but structurally there is no question in the second part of this utterance and so no question mark is transcribed.

E:            so what do you think is the answer then? you think that parents should be at home more

S:            no I think they should have the choice

Unclear Question:

Key words are unclear making question structure incomplete; no question mark is transcribed.

S:            <unclear=can you> repeat the question please

A Complex Utterance with a Question Structure:

A number of self-corrections but the structure of a question exists.

S:            and do you think it’s it’s good to be in to be in touch with many people and to and to and to con= er contact with your friends and erm and at your home for exa= on your home for example?

Interrupted Question:

If the question is interrupted no question mark is transcribed, however, sometimes a short question structure remains.

S:            is he er good enough?  to

E:            mm

S:            you know develop India and make it a superpower

Interrupted Either/Or Question:

What would originally have been a single either/or question is interrupted resulting in two independent question structures which are each transcribed with question marks.

E:            do you think it’s a skill?

S:            erm I think

E:            or can you get better at it?

So this has been a glimpse at some of the many varied ways speakers use language to elicit a response.  Time and again we chant our mantra: “If in doubt, leave it out“! 

The full version of our Questions Bank is now pretty exhaustive.  Generally we find that utterances can be mapped onto existing example structures so we can be confident that the decision as to if/where to transcribe the question mark will be consistent with previous decisions. So the Questions Bank, for us, has definitely been a valuable transcription tool.