Data-driven learning: learning from assessment

The process of converting valuable spoken corpus data into classroom materials is not necessarily straightforward, as a recent project conducted by Trinity College London reveals.

One of the buzz words we increasingly hear from teacher trainers in English Language Teaching (ELT) is the use of data-driven learning. This ties in with other contemporary pedagogies, such as discovery learning.  A key component of this is how data from a corpus can be used to inform learning. One of our long-running projects with the Trinity Lancaster Corpus has been to see how we could use the spoken data in the classroom so that students could learn from assessment as well as for assessment. We have reported before (From Corpus to Classroom 1 and From Corpus to Classroom 2) on the research focus on pragmatic and strategic examples. These linguistic features and competences are often not practised – or are only superficially addressed – in course books and yet can be significant in enhancing learners’ communication skills, especially across cultures. Our ambition is to translate the data findings for classroom use, specifically to help teachers improve learners’ wider speaking competences.

We developed a process of constructing sample worksheets based on, and including, the corpus data. The data was contextualized and presented to teachers in order to give them an opportunity to use their expertise in guiding how this data could be developed for, and utilized in, the classroom. So, essentially, we asked teachers to collaborate on checking how useful the data and tasks were and potentially improving these tasks. We also asked teachers to develop their own tasks based on the data and we now have the results of this project.

Overwhelmingly, the teachers were very appreciative of the data and they each produced some great tasks. All of these were very useful for the classroom but they did not really exploit the unique information we identified as being captured in the data. We have started exploring why this might be the case.

What the teachers did was the following:

  • Created noticing and learner autonomy activities with the data (though most tasks would need much more scaffolding).
  • Focused on traditional information about phrases identified in the data, e.g. the strength and weakness of expressions of agreement.
  • Created activities that reflected traditional course book approaches.
  • Created reflective, contextual practice related to the data although this sometimes became lost in the addition of extra non-corpus texts.

We had expectations that the data would inspire activities which:

  • showed new ways of approaching the data
  • supported discovery learning tasks with meaningful outcomes
  • explored the context and pragmatic functions of the data
  • reflected pragmatic usage; perhaps even referring to L1 as a resource for this
  • focused on the listener and interpersonal aspects rather than just the speaker

It was clear that the teachers were intellectually engaged and excited, so we considered the reasons why their tasks had taken a more traditional path than expected. Many of these have been raised in the past by Tim Johns and Simon Borg. There is no doubt that the heavy teacher workload affects how far teachers feel they can be innovative with materials. There is a surety in doing what you know and what you know works. Also many teachers, despite being in the classroom everyday, often need a certain confidence to design input when this is traditionally something that has been left to syllabus and course book creators. Another issue was that we realised that teachers would probably have to have more support in understanding corpus data and many don’t have the time to do extra training. Finally, there may be the issue with this particular data that teachers may not be fully aware of the importance of pragmatic and strategic competences. Often they are seen as an ‘add-on’ rather than a core competence especially in contexts for contemporary communications when it is largely being used as a lingua franca.

Ultimately, there was a difference between what the researchers ‘saw’ and what the teachers ‘saw’. As an alternative, we asked a group of expert material writers to produce new tasks and they have produced some innovative material. We concluded that maybe this is a fairer approach. In other words, instead of expecting each of the roles involved in language teaching (SLA researchers, teachers, materials designers) to find the time to become experts in new skills, it may sometimes be better to use each other as a resource. This would still be a learning experience as we draw on each other’s expertise.

In future if we want teachers to collaborate on designing materials we must make sure we discuss the philosophy or pedagogy behind our objectives (Rapti, 2013) with our collaborators, that we show how the data is mapped to relevant curricula and that we recognise the restrictions caused by practical issues such as a lack of time or training opportunities.

The series of worksheets is now available from the Trinity College London website. More to come in the future so keep checking.

TLC and innovation in language testing

One of the objectives of Trinity College London investing in the Trinity Lancaster Spoken Corpus has been to share findings with the language assessment community. The corpus allows us to develop an innovative approach to validating test constructs and offers a window into the exam room so we can see how test takers utilise their language skills in managing the series of test tasks.

Recent work by the CASS team in Lancaster has thrown up a variety of features that illustrate how test takers voice their identity in the test, how they manage interaction through a range of strategic competences and how they use epistemic markers to express their point of view and negotiate a relationship with the examiner (for more information see Gablasova et al. 2015). I have spent the last few months disseminating these findings at a range of language testing conferences and have found that the audiences have been fascinated by the findings.

We have presented findings at BAAL TEASIG in Reading, at EAQUALS in Lisbon  and at EALTA in Valencia. Audiences ranged from assessment experts to teacher educators and classroom practitioners and there was great interest both in how the test takers manage the exam as well as the manifestations of L2 language. Each presentation was tailored to the audience and the theme of the conference. In separate presentations, we covered how assessments can inform classroom practice, how the data could inform the type of feedback we give learners and how the data can be used to help validate aspects of the test construct. The feedback has been very positive, urging us to investigate further. Comments have praised the extent and quality of the corpus and range from the fact that the evidence “is something that we have long been waiting for” (Dr Parvaneh Tavakoli, University of Reading) to musings on what some of the data might mean both for how we assess spoken language and the implications for the classroom. It has certainly opened the door on the importance of strategic and pragmatic competences as well as validating Trinity’s aims to allow the test taker to bring themselves into the test.  The excitement spilled over into some great tweets. There is general recognition that the data offers something new – sometimes confirming what we suspected and sometimes – as with all corpora – refuting our beliefs!

We have always recognised that the data is constrained by the semi-formal context of the test but the fact that each test is structured but not scripted and has tasks which represent language pertinent to communicative events in the wider world allows the test taker to produce language which is more reflective of naturally occurring speech than many other oral tests. It has been enormously helpful to have feedback from the audiences who have fully engaged with the issues raised and highlighted aspects we can investigate in greater depth as well as raising features they would like to know more about. These features are precisely those that the research team wishes to explore in order to develop ‘a more fine-grained and comprehensive understanding of spoken pragmatic ability and communicative competence’ (Gablasova et al. 2015: 21)

One of the next steps is to show how this data can be used to develop and support performance descriptors. Trinity is confident that the features of communication which the test takers display are captured in its new Integrated Skills in English exam validating claims that Trinity assesses real world communication.

From Corpus to Classroom 1

The Trinity Lancaster Corpus of Spoken Learner English is providing multiple sets of data that can not only be used for validating the quality of our tests but also – and most importantly – to feedback important features of language that can be utilised in the classroom. It is essential that some of our research is focused on how Trinity informs and supports teachers in improving communicative competences in their learners and this is forming part of an ongoing project the research team are setting up in order to give teachers access to this information.

Trinity has always been focused on communicative approaches to language teaching and the heart of the tests is about communicative competences. The research team are especially excited to see that the data is revealing the many ways in which test takers use these communicative competences to manage their interaction in the spoken tests. It is very pleasing to see that not only does the corpus evidence support claims that the Trinity tests of spoken language are highly interactive but also it establishes some very clear features of effective communicative that can be utilised by teachers in the classroom.

The strategies which test takers use to communicate successfully include:

  • Asking more questions

Here the test taker relies less on declarative sentences to move a conversation forward but asks clear questions (direct and indirect) that are more immediately accessible to the listener.

  • Demonstrating active listenership through backchannelling

This involves offering more support to the conversational partner by using signals such as okay, yes, uhu, oh, etc to demonstrate engaged listenership.

  • Taking responsibility for the conversation through their contributions

Successful test takers help move the conversation along by by creating opportunities with e.g. questions, comments or suggestions that their partner can easily react to.

  • Using fewer hesitation markers

Here the speaker makes sure they keep talking and uses fewer markers such as er, erm which can interrupt fluency.

  • Clarifying what is said to them before they respond

This involves the test taker checking through questions that they have understood exactly what has been said to them.

Trinity is hopeful that these types of communicative strategies can be investigated across the tests and across the various levels in order to extract information which can be fed back into the classroom.  Teachers – and their learners – are interested to see what actually happens when the learner has the opportunity to put their language into practice in a live performance situation. It makes what goes on in the classroom much more real and gives pointers to how a speaker can cope in these situations.

More details about these points can be found on the Trinity corpus website and classroom teaching materials will be uploaded shortly to support teachers in developing these important strategies in their learners.

Also see CASS briefings for more information on successful communication strategies in L2.

The heart of the matter …

TLC-LogoHow wonderful it is to get to the inner workings of the creature you helped bring to life! I’ve just spent a week with the wonderful – and superbly helpful – team at CASS devoting time to matters on the Trinity Lancaster Spoken Corpus.

Normally I work from London situated in the very 21st century environment of the web – I plan, discuss and investigate the corpus across the ether with my colleagues in Lancaster. They regularly visit us with updates but the whole ‘system’ – our raison d’etre if you like – sits inside a computer. This, of course, does make for very modern research and allows a much wider circle of access and collaboration. But there is nothing like sitting in the same room as colleagues, especially over the period of a few days, to test ideas, to leap connections and to get the neural pathways really firing.


It’s been a stimulating week not least because we started with the wonderful GraphColl, a new collocation tool which allows the corpus to come to life before our eyes. As the ‘bubbles’ of lexis chase across the screen searching for their partners, they pulse and bounce. Touching one of them lights up more collocations, revealing the mystery of communication. Getting the number right turns out to be critical in producing meaningful data that we can actually read – too loose and we end up with a density we cannot untangle; the less the better seems to be the key.  It did occur to me that finally language had produced something that could contribute to the Science Picture Library where GraphColl images could complement the shots of language activity in the brain. I’ve been experimenting with it this week – digging out question words from part of the corpus to find out how patterned they are – more to come.

We’ve also been able to put more flesh on the bones of an important project developed by Vaclav Brezina – how to make the corpus meaningful for teachers (and students). Although we live in an era where the public benefit of science is rightly foregrounded, it can be hard sometimes to ‘translate’ the science and complexity of the supporting technology so that it is of real value to the very people who created the corpus. Vaclav has been preparing a series of extracts of corpus data that can come full circle back into the classroom by showing teachers and their students the way that language works – not in the textbooks but in real ‘lingua franca’ life. In other words, demonstrating the language that successful learners use to communicate in global contexts. This is going to be turned into a series of teaching materials with the quality and relevance being assured by crowdsourcing teaching activities from the teachers themselves.

time Collocates of time in the GESE interactive task

Meanwhile I am impressed by how far the corpus – this big data – is able to support Trinity by helping to build robust validity arguments for the GESE test.  This is critical in helping Trinity’s core audience – our test takers –  to understand why should I do this test, what will the test demonstrate, what effect will it have on my learning, is it fair?  All in all a very productive week.

Trinity Lancaster Corpus: A glimpse of the future

At Trinity we are totally impressed that our spoken learner corpus is now just over 1.5 million words. Although there are still some quality checks to run, it means we’ve reached that anticipatory moment where we can start digging into the goldmine and seeing what insights the data can offer. We’ve been working closely with CASS so that their team have been able to participate in Trinity’s test creation processes as well as examiner training sessions. This has allowed the researchers to fully understand the communicative skills the exam elicits and to identify interesting aspects of language that might be investigated. Equally, the Trinity team are very much looking forward to an upcoming visit to Lancaster where the CASS team will guide us on the corpus tools and the type of reports we can run that will access the data we need for our own research interests into the test itself.

Currently we are so excited at having such a wealth of data at our fingertips that we are in that dangerous moment of skimming the corpus to see if our assumptions are played out. We’ve all been there – when you are convinced that the corpus will finally confirm your long held beliefs about how learners use language – only to discover that you are wrong or the evidence is not there! This is, however, significantly ameliorated by emerging findings that will allow us to add a quantitative component to our test validity arguments. Mining corpus data indicates a new approach to evidencing that the test tasks are performing as anticipated – and as designed! And then there’s that little delve where the numbers and patterns indicate something unanticipated – how delicious!

This Trinity Lancaster corpus is fascinating because it comprises data from tasks where the candidate is given free rein to ‘show off’ their language skills and engage in authentic interaction with the examiner – thus giving a very close parallel with real life and so enriching applied linguistic research. At the same time, the test also contains a task type which really hones into the candidate’s skill at enacting Gricean principles of co-operation thus allowing us to investigate metacognitive processes such as how learners manage a conversation.

It has to be said that we recognize that the opportunities and insights offered by this unique corpus are in large part down to the high quality corpus transcription and annotation process implemented by CASS. We are now planning the collection of 2014 data, including we hope a wider range of L1s – because we are now totally addicted!

Trinity oral test corpus: The first hurdle

At Trinity we are wildly excited – yes, wildly – to finally have our corpus project set up with CASS. It’s a unique opportunity to create a learner corpus of English based on some fairly free flowing L2 language which is not too constrained by the testing context.  All Trinity oral tests are recorded and most of the tests include one or two tasks where the candidate has free rein to talk about their own interests in their own way – very much their own contributions, expressed as themselves. We have been hoping to use what is referred to as our ‘gold dust’ for research that will be meaningful – not just to the corpus community but also in terms of the impact on our tests and our feedback to learners and teachers. Working with CASS has now given us this golden opportunity.

The project is now up and running and in the corpus building stage and we have moved from the heady excitement of imaging what we could do with all the data to the grindstone of pulling together all the strands of meta data needed to make the corpus robust and useful. The challenges are real – for example, we need to log first languages but how do we ensure reliability? Meta data is now an  opt-in in most countries so how do we capture everyone? Even when the data boxes are completed how do we know it’s true? No, the only way is the very non-technological method of contacting the students again and following up in person.

A related concern is has the meta data we need shifted? We would normally be interested in what kind of input students had had to their learning so e.g. how many years study etc. In the past, part of this  data gathering was to ask about time learners had spent in an English-speaking country. Should this now be shifted to time spent watching videos online in English, in social media, in reading online sources? What is relevant –and also collectable?

The challenges in what might be considered this no-core information is forcing us to re-examine how sure we are about influences on learning – not just our perception but form the learner’s perception as well.