Tue27May201415.00-16.30Lancaster University, Frankland Lecture Theatre, Faraday Building
Learner language and natural language processing
Lecture by Professor Detmar Meurers
The automatic analysis of learner language is potentially relevant in a range of contexts, from the online analysis of learner language aimed at providing individual feedback in Intelligent Language Tutoring Systems (ILTS) to the automatic annotation of learner corpora in support of Second Language Acquisition (SLA) research and Foreign Language Teaching and Learning (FLTL) practice. In this talk, I want to raise some questions about the interpretation of learner data involved in any such analysis, focusing the discussion on learner corpora.Learner corpora as collections of language produced by second language learners have been systematically collected since the 90s, and with readily available collections such as ICLE and new Big-Data collections such as EFCamDat there is a growing empirical basis of potential relevance to SLA research. Yet, as soon as the research questions go beyond the acquisition of vocabulary and constructions with unambiguous surface indicators, corpora must be enhanced with linguistic annotation to support efficient retrieval of the instances of data that are relevant for specific research questions.
In contrast to the different types of linguistic annotation schemes which have been developed for native language corpora, the discussion on which linguistic annotation is meaningful and appropriate for learner language is only starting. When formulating linguistic generalizations, one generally relies on a long tradition of linguistic analysis that has established an inventory of categories and properties abstracting away from the surface strings. We will show that traditional linguistic categories are not necessarily an appropriate index into the space of interlanguage realizations and their systematicity, which research into second language acquisition aims tocapture. We will argue for balancing robustness of categorization and representation of the actual observations and their variability. Complementing the discussion of the corpus annotation as such, we then discuss the need for explicit information about the task from which the corpus resulted and the learners who produced it for interpreting and annotating learner data.
- Detmar Meurers (to appear). "Learner Corpora and Natural Language Processing". The Cambridge Handbook of Learner Corpus Research, edited by Sylviane Granger, Gaëtanelle Gilquin and Fanny Meunier. Cambridge University Press. To be revised draft available at: http://purl.org/dm/papers/Meurers-LCNLP-draft.pdf
- Detmar Meurers (2013) "Natural Language Processing and Language Learning". Encyclopedia of Applied Linguistics, edited by Carol A. Chapelle. Blackwell.
- Ana Díaz-Negrillo, Detmar Meurers, Salvador Valera, and Holger Wunsch (2010): "Towards interlanguage POS annotation for effective learner corpora in SLA and FLT". Language Forum. Vol 36, No 1-2. Special Issue on New Trends in Language Teaching, edited by Carmen Pérez Basanta.
This event is a general lecture, co-organized by the Department of Linguistics and English Language, the Second Language Learning and Teaching Group (SLLAT), the ESRC Centre for Corpus Approaches to Social Science (CASS), and the University Centre for Computer Corpus Research on Language (UCREL).
Who can attend: Anyone