New resources are being added regularly to the new CASS: Briefings tab above, so check back soon.
We at the Trinity Lancaster Learner Corpus team are very pleased to announce that we have a logo for our lovely corpus. We very much hope that it represents the corpus by capturing its key features. We knew we wanted to portray what we feel are the unique aspects of our corpus – interactive L2 spoken data – but that within this we had to reflect the identities of both the ESRC Centre for Corpus Approaches to Social Science at Lancaster University and Trinity. It was a challenge in the design brief to present abstract concepts in a way that allowed the designer to ‘see’ our vision.
We are very happy with the result. We have a logo which displays our respective organisational ‘colours’ in two speech bubbles. The two bubbles denote the largely interactive nature of the data and are slightly squared off to shift away from cartoon associations and towards the academic nature of our endeavours.
We are happy that the logo gives the corpus a very strong identity and is meaningful to the wider community of researchers and language practitioners who are likely to engage with the corpus in the future. It all feels very real now as the Lancaster team process and tag the final recordings that make up the data.
On Friday 30th January 2015, I gave a talk at the International ESOL Examiner Training Conference 2015 in Stafford. Every year, the Trinity College London, CASS’s research partner, organises a large conference for all their examiners which consists of plenary lectures and individual training sessions. This year, I was invited to speak in front of an audience of over 300 examiners about the latest development in the learner corpus project. For me, this was a great opportunity not only to share some of the exciting results from the early research based on this unique resource, but also to meet the Trinity examiners; many of them have been involved in collecting the data for the corpus. This talk was therefore also an opportunity to thank everyone for their hard work and wonderful support.
It was very reassuring to see the high level of interest in the corpus project among the examiners who have a deep insight into examination process from their everyday professional experience. The corpus as a body of transcripts from the Trinity spoken tests in some way reflects this rich experience offering an overall holistic picture of the exam and, ultimately, L2 speech in a variety of communicative contexts.
Currently, the Trinity Lancaster Corpus consists of over 2.5 million running words sampling the speech of over 1,200 L2 speakers from eight different L1 and cultural backgrounds. The size itself makes the Trinity Lancaster Corpus the largest corpus of its kind. However, it is not only the size that the corpus has to offer. In cooperation with Trinity (and with great help from the Trinity examiners) we were able to collect detailed background information about each speaker in our 2014 dataset. In addition, the corpus covers a range of proficiency levels (B1– C2 levels of the Common European Framework), which allows us to research spoken language development in a way that has not been previously possible. The Trinity Lancaster Corpus, which is still being developed with an average growth of 40,000 words a week, is an ambitious project: Using this robust dataset, we can now start exploring crucial aspects of L2 speech and communicative competence and thus help language learners, teachers and material developers to make the process of L2 learning more efficient and also (hopefully) more enjoyable. Needless to say, without Trinity as a strong research partner and the support from the Trinity examiners this project wouldn’t be possible.
At Trinity we are wildly excited – yes, wildly – to finally have our corpus project set up with CASS. It’s a unique opportunity to create a learner corpus of English based on some fairly free flowing L2 language which is not too constrained by the testing context. All Trinity oral tests are recorded and most of the tests include one or two tasks where the candidate has free rein to talk about their own interests in their own way – very much their own contributions, expressed as themselves. We have been hoping to use what is referred to as our ‘gold dust’ for research that will be meaningful – not just to the corpus community but also in terms of the impact on our tests and our feedback to learners and teachers. Working with CASS has now given us this golden opportunity.
The project is now up and running and in the corpus building stage and we have moved from the heady excitement of imaging what we could do with all the data to the grindstone of pulling together all the strands of meta data needed to make the corpus robust and useful. The challenges are real – for example, we need to log first languages but how do we ensure reliability? Meta data is now an opt-in in most countries so how do we capture everyone? Even when the data boxes are completed how do we know it’s true? No, the only way is the very non-technological method of contacting the students again and following up in person.
A related concern is has the meta data we need shifted? We would normally be interested in what kind of input students had had to their learning so e.g. how many years study etc. In the past, part of this data gathering was to ask about time learners had spent in an English-speaking country. Should this now be shifted to time spent watching videos online in English, in social media, in reading online sources? What is relevant –and also collectable?
The challenges in what might be considered this no-core information is forcing us to re-examine how sure we are about influences on learning – not just our perception but form the learner’s perception as well.