Corpus Data and Psycholinguistics Seminar

On the afternoon of Thursday 19th May 2016, CASS held its first ever psycholinguistics seminar which brought together researchers from both linguistics and psychology. The theme of the seminar was “Corpus Data and Psycholinguistics”, with a particular focus on experimental psycholinguistics.

The afternoon consisted of four 40-minute presentations which covered a range of different experimental methods including eye-tracking and EEG. Interestingly, the notion of collocation also emerged as a strong theme throughout the presentations. Different types of collocation were addressed, including bigrams, idioms, and compounds, and this prompted thought-provoking discussions about the nature of collocation and the relationship between psycholinguistic results and the different statistical measures of collocation strength.

The first presentation was delivered by Professor Padraic Monaghan from the Psychology Department at Lancaster University. In this presentation, Padraic provided an engaging introduction to computational modelling in psycholinguistics, focusing mainly on connectionist models where the input determines the structure of processing. This talk prompted a particularly interesting observation about the relationship between connectionist models and parts-of-speech tags in corpora.

In the second presentation, Dr Phil Durrant from the University of Exeter provided a critical perspective on his own earlier work into whether or not psycholinguistic priming is evident in collocations at different levels of frequency, and on the distinction between the related notions of collocation and psychological association. This presentation also provided a really interesting insight into the different ways in which corpus linguistics and psychological experimentation can be combined in psycholinguistic studies. This really helped to contextualise the studies reported in the other presentations within the field of psycholinguistics.

After a short break, I presented the results of the first of several studies which will make up my PhD thesis. This initial study pilots a procedure for using EEG to determine whether or not the brain is sensitive to the transition probabilities between words. This was an excellent opportunity for me to gain feedback on my work and I really appreciate the input and suggestions for further reading that I received from participants at this event.

The final presentation of the afternoon was delivered by Professor Michaela Mahlberg and Dr Gareth Carroll from the University of Birmingham. This presentation drew upon eye-tracking data from a study exploring literary reading in order to pinpoint the methodological issues associated with combining eye-tracking techniques with literary corpora, and with corpus data more generally.

With such an interesting series of talks sharing the theme of “Corpus Data and Psycholinguistics”, the CASS psycholinguistics seminar proved to be a very successful event. We would like to thank the presenters and all of the participants who attended the seminar for their contribution to the discussions, and we are really looking forward to hosting similar seminars in the near future.

The heart of the matter …

TLC-LogoHow wonderful it is to get to the inner workings of the creature you helped bring to life! I’ve just spent a week with the wonderful – and superbly helpful – team at CASS devoting time to matters on the Trinity Lancaster Spoken Corpus.

Normally I work from London situated in the very 21st century environment of the web – I plan, discuss and investigate the corpus across the ether with my colleagues in Lancaster. They regularly visit us with updates but the whole ‘system’ – our raison d’etre if you like – sits inside a computer. This, of course, does make for very modern research and allows a much wider circle of access and collaboration. But there is nothing like sitting in the same room as colleagues, especially over the period of a few days, to test ideas, to leap connections and to get the neural pathways really firing.


It’s been a stimulating week not least because we started with the wonderful GraphColl, a new collocation tool which allows the corpus to come to life before our eyes. As the ‘bubbles’ of lexis chase across the screen searching for their partners, they pulse and bounce. Touching one of them lights up more collocations, revealing the mystery of communication. Getting the number right turns out to be critical in producing meaningful data that we can actually read – too loose and we end up with a density we cannot untangle; the less the better seems to be the key.  It did occur to me that finally language had produced something that could contribute to the Science Picture Library where GraphColl images could complement the shots of language activity in the brain. I’ve been experimenting with it this week – digging out question words from part of the corpus to find out how patterned they are – more to come.

We’ve also been able to put more flesh on the bones of an important project developed by Vaclav Brezina – how to make the corpus meaningful for teachers (and students). Although we live in an era where the public benefit of science is rightly foregrounded, it can be hard sometimes to ‘translate’ the science and complexity of the supporting technology so that it is of real value to the very people who created the corpus. Vaclav has been preparing a series of extracts of corpus data that can come full circle back into the classroom by showing teachers and their students the way that language works – not in the textbooks but in real ‘lingua franca’ life. In other words, demonstrating the language that successful learners use to communicate in global contexts. This is going to be turned into a series of teaching materials with the quality and relevance being assured by crowdsourcing teaching activities from the teachers themselves.

time Collocates of time in the GESE interactive task

Meanwhile I am impressed by how far the corpus – this big data – is able to support Trinity by helping to build robust validity arguments for the GESE test.  This is critical in helping Trinity’s core audience – our test takers –  to understand why should I do this test, what will the test demonstrate, what effect will it have on my learning, is it fair?  All in all a very productive week.

Welcoming new CASS Senior Research Associate: Carmen Dayrell

We are pleased to announce that Dr Carmen Dayrell (c.dayrell(Replace this parenthesis with the @ sign) has joined the ESRC Centre for Corpus Approaches to Social Science as the Senior Research Associate on the Changing Climates project. You can read a bit more about her in her own words, below.

My main research interests relate to the use of corpus methodologies to study language production, from various perspectives and in different settings.

I was first interested in the distinctive features of translated language and carried out a corpus study to investigate potential differences (and similarities) between collocational patterns in translated and non-translated texts of the same language. The main issue under my investigation was whether collocational patterns tend to be less diverse (i.e. reduced in range) in translations when compared to texts originally written in the language in question.

I then turned my attention to English academic writing and examined lexical and syntactical features of abstracts written in English by Brazilian graduate students, hence native speakers of Portuguese, from various disciplines. My primary goal was to provide insights for the development and improvement of teaching material that can directly target the specific needs of Brazilian novice researchers. This research project involved comparing the textual patterns used by students vis-à-vis those in abstracts taken from published papers from the same disciplines.

My current research focuses on the discourse of climate change in media coverage. This is part of a larger project which aims to conduct a large-scale, systematic empirical analysis of climate change discourse across Britain and Brazil. Our primary purpose is to investigate how the issue has been framed in these two countries in the past decade.


  • PhD in Translation Studies, CTIS – The University of Manchester (UK)
  • MA in Applied Linguistics, Federal University of Minas Gerais (UFMG,Brazil)
  • BA in Business Administration and Accountancy, Pontifical Catholic University of Minas Gerais (PUC-MG, Brazil)

Selected Publications

DAYRELL, C. (2011a) ‘Corpora and academic English teaching: lexico-grammatical patterns in abstracts written by Brazilian graduate students’. Vander Viana and Stella Tagnin (Eds.) Corpora in Foreign Language Teaching. São Paulo: HUB Editorial, pp. 137-172. (in Portuguese)

DAYRELL, C. (2011b) ‘Anticipatory ‘it’ in English abstracts: a corpus-based study of non-native student and published writing’. Stanisław Goźdź-Roszkowski (Ed.) Explorations across Languages and Corpora. Frankfurt am Main: Peter Lang, pp. 581-598.

DAYRELL, C. (2010) ‘Frequency and lexico-grammatical patterns of sense-related verbs in English and Portuguese abstracts’. Richard Xiao (ed.) Using Corpora in Contrastive and Translation Studies. Newcastle: Cambridge Scholars Publishing, pp. 486-507.

DAYRELL, C. (2009) ‘Sense-related verbs in English scientific abstracts: a corpus-based study of students’ writing’. ESP Across Cultures 6: 61-78.

DAYRELL, C. (2008) ‘Investigating the preference of translators for recurrent lexical patterns: a corpus-based study’. Juliane House (ed.) Beyond Intervention: Universals in Translation?, TRANSKOM, First and Special Issue of TRANSKOM (1). Available at

DAYRELL, C. (2007) ‘A quantitative approach to investigate collocations in translated texts’. International Journal of Corpus Linguistics, 12(3): 415-444.