Spoken Learner Corpus (SLC) Project

This project is a collaboration between CASS and Trinity College London, a major international exam board. The aim of the project is to create a large corpus of learner (and examiner) speech which will be used in a wide range of research contexts including Second Language Acquisition, language testing, L2 pedagogy and material development etc. Trinity Lancaster Learner Corpus will be made freely available to the research community (planned release 2017).

The corpus will be a unique research resource for investigating learner speech at different proficiency levels (advanced, intermediate and lower intermediate/threshold) and will provide an insight into spoken learner production across different tasks (both monologic and interactive). Also, the corpus will sample language of learners with a variety of L1 backgrounds, representing English speakers from Italy, Spain, Mexico, China, India, Sri Lanka and Russia.

Team:

Principal Investigator: Tony McEnery

Co-Investigators:

Trinity Team:

Elaine Boyd
Cathy Taylor
Avril Ikeda-Wood

Senior Research Associate: Dana Gablasova

Audio transcriber: Ruth Avon

Read the latest updates on this project:

Introductory Blog – Hanna Schmueck (18 November 2020)
I am very honoured to have received the Geoffrey Leech Outstanding MA Student Award for my MA in Language and Linguistics. This award traditionally goes to the MA student with the highest overall average. I started my postgraduate journey in September 2019 after finishing my undergraduate degree at the University of Bamberg (Germany) in 2018 …
Corpus Linguistics + Language Testing Workshop (3 April 2019)
‘Corpus-Based Approaches to Language Testing’ was held on Thursday 7th March 2019 and was the second event organised by the Trinity Lancaster Corpus research group at Lancaster University. This event was supported by an ESRC International Networking grant and hosted by the ESRC Centre for Corpus Approaches to Social Science (CASS). We welcomed 45 people from …
How to Produce Vocabulary Lists (1 August 2017)
As part of the Forum discussion in Applied Linguistics, we have formulated some basic principles of corpus-based vocabulary studies and pedagogical wordlist creation and use. These principles can be summarised as follows: Explicitly define the vocabulary construct. Operationalize the vocabulary construct using transparent and replicable criteria. If using corpora, take corpus evidence seriously and avoid cherry-picking. Use multiple sources …
Data-driven learning: learning from assessment (7 June 2017)
The process of converting valuable spoken corpus data into classroom materials is not necessarily straightforward, as a recent project conducted by Trinity College London reveals. One of the buzz words we increasingly hear from teacher trainers in English Language Teaching (ELT) is the use of data-driven learning. This ties in with other contemporary pedagogies, such as discovery learning. A key …
Corpus-based insights into spoken L2 English: Introducing eight projects that use the Trinity Lancaster Corpus (22 March 2017)
In November 2016, we announced the Early Data Grant Scheme in which researchers could apply for access to the Trinity Lancaster Corpus (TLC) before its official release in 2018. The Early Data subset of the corpus contains 2.83 million words from 1,244 L2 speakers. The Trinity Lancaster Corpus project is a product of an ongoing collaboration …
Further Trinity Lancaster Corpus research: Examiner strategies (25 August 2016)
This month saw a further development in the corpus analyses: the examiners. Let me introduce myself, my name is Cathy Taylor and I’m responsible for examiner training at Trinity and was very pleased to be asked to do some corpus research into the strategies the examiners use when communicating with the test takers. In the GESE …
TLC and innovation in language testing (26 May 2016)
One of the objectives of Trinity College London investing in the Trinity Lancaster Spoken Corpus has been to share findings with the language assessment community. The corpus allows us to develop an innovative approach to validating test constructs and offers a window into the exam room so we can see how test takers utilise their …
From Corpus to Classroom 2 (16 March 2016)
There is great delight that the Trinity Lancaster Corpus is providing so much interesting data that can be used to enhance communicative competences in the classroom. From Corpus to Classroom 1 described some of these findings. But how exactly do we go about ‘translating’ this for classroom use so that it can be used by …
Syntactic structures in the Trinity Lancaster Corpus (3 March 2016)
We are proud to announce collaboration with Markus Dickinson and Paul Richards from the Department of Linguistics, Indiana University on a project that will analyse syntactic structures in the Trinity Lancaster Corpus. The focus of the project is to develop a syntactic annotation scheme of spoken learner language and apply this scheme to the Trinity …
From Corpus to Classroom 1 (17 February 2016)
The Trinity Lancaster Corpus of Spoken Learner English is providing multiple sets of data that can not only be used for validating the quality of our tests but also – and most importantly – to feedback important features of language that can be utilised in the classroom. It is essential that some of our research …

CASS

Spoken Learner Corpus (SLC) Project

About

Contact

Recent Post