Event Information:

  • Tue

    Learner language and natural language processing

    15.00-16.30Lancaster University, Frankland Lecture Theatre, Faraday Building

    Lecture by Professor Detmar Meurers

    The automatic analysis of learner language is potentially relevant in a range of contexts, from the online analysis of learner language aimed at providing individual feedback in Intelligent Language Tutoring Systems (ILTS) to the automatic annotation of learner corpora in support of Second Language Acquisition (SLA) research and Foreign Language Teaching and Learning (FLTL) practice. In this talk, I want to raise some questions about the interpretation of learner data involved in any such analysis, focusing the discussion on learner corpora.Learner corpora as collections of language produced by second language learners have been systematically collected since the 90s, and with readily available collections such as ICLE and new Big-Data collections such as EFCamDat there is a growing empirical basis of potential relevance to SLA research. Yet, as soon as the research questions go beyond the acquisition of vocabulary and constructions with unambiguous surface indicators, corpora must be enhanced with linguistic annotation to support efficient retrieval of the instances of data that are relevant for specific research questions.

    In contrast to the different types of linguistic annotation schemes which have been developed for native language corpora, the discussion on which linguistic annotation is meaningful and appropriate for learner language is only starting. When formulating linguistic generalizations, one generally relies on a long tradition of linguistic analysis that has established an inventory of categories and properties abstracting away from the surface strings. We will show that traditional linguistic categories are not necessarily an appropriate index into the space of interlanguage realizations and their systematicity, which research into second language acquisition aims tocapture. We will argue for balancing robustness of categorization and representation of the actual observations and their variability. Complementing the discussion of the corpus annotation as such, we then discuss the need for explicit information about the task from which the corpus resulted and the learners who produced it for interpreting and annotating learner data.

    Background references:

    This event is a general lecture, co-organized by the Department of Linguistics and English Language, the Second Language Learning and Teaching Group (SLLAT), the ESRC Centre for Corpus Approaches to Social Science (CASS), and the University Centre for Computer Corpus Research on Language (UCREL).


    Who can attend: Anyone