Trinity Lancaster Corpus: A glimpse of the future

At Trinity we are totally impressed that our spoken learner corpus is now just over 1.5 million words. Although there are still some quality checks to run, it means we’ve reached that anticipatory moment where we can start digging into the goldmine and seeing what insights the data can offer. We’ve been working closely with CASS so that their team have been able to participate in Trinity’s test creation processes as well as examiner training sessions. This has allowed the researchers to fully understand the communicative skills the exam elicits and to identify interesting aspects of language that might be investigated. Equally, the Trinity team are very much looking forward to an upcoming visit to Lancaster where the CASS team will guide us on the corpus tools and the type of reports we can run that will access the data we need for our own research interests into the test itself.

Currently we are so excited at having such a wealth of data at our fingertips that we are in that dangerous moment of skimming the corpus to see if our assumptions are played out. We’ve all been there – when you are convinced that the corpus will finally confirm your long held beliefs about how learners use language – only to discover that you are wrong or the evidence is not there! This is, however, significantly ameliorated by emerging findings that will allow us to add a quantitative component to our test validity arguments. Mining corpus data indicates a new approach to evidencing that the test tasks are performing as anticipated – and as designed! And then there’s that little delve where the numbers and patterns indicate something unanticipated – how delicious!

This Trinity Lancaster corpus is fascinating because it comprises data from tasks where the candidate is given free rein to ‘show off’ their language skills and engage in authentic interaction with the examiner – thus giving a very close parallel with real life and so enriching applied linguistic research. At the same time, the test also contains a task type which really hones into the candidate’s skill at enacting Gricean principles of co-operation thus allowing us to investigate metacognitive processes such as how learners manage a conversation.

It has to be said that we recognize that the opportunities and insights offered by this unique corpus are in large part down to the high quality corpus transcription and annotation process implemented by CASS. We are now planning the collection of 2014 data, including we hope a wider range of L1s – because we are now totally addicted!

+ posts