Trinity Lancaster Corpus

 (TLC) is currently the largest corpus of spoken L2 English. The TLC was developed in a cooperation between the Centre for Corpus Approaches to Social Science (CASS) at Lancaster University and Trinity College London, a major international examination board. The data used in the corpus were collected from 2012 to 2018 as part of the Graded Examinations in Spoken English (GESE), an exam developed and administered by Trinity College London. Overall, the corpus contains 4.2 million words (tokens) of transcribed spoken interaction between exam candidates (L2 speakers of English) and examiners (L1 speakers of English). The L2 data come from over 2,000 L2 speakers from different cultural and linguistic backgrounds and with a range of sociolinguistic characteristics.

Lancaster Team: Dana Gablasova, Vaclav Brezina and Tony McEnery

Audio Transcriber: Ruth Avon 


Gablasova, D., Brezina, V., & McEnery, T. (2019). The Trinity Lancaster Corpus: Development, Description and ApplicationInternational Journal of Learner Corpus Research5(2), 126-158.

“This paper introduces a new corpus resource for language learning research, the Trinity Lancaster Corpus (TLC), which contains 4.2 million words of interaction between L1 and L2 speakers of English. The corpus includes spoken production from over 2,000 L2 speakers from different linguistic and cultural backgrounds at different levels of proficiency engaged in two to four tasks.” [Read more]


Trinity Lancaster Corpus (TLC) is now available via TLC Hub (password: Lancaster1964)