ESRC Postdoctoral Fellowship: The psychological validity of non-adjacent collocations

Having recently completed my PhD in CASS, I am really excited to announce that I have been awarded an ESRC Postdoctoral Fellowship for the upcoming academic year.

My research focuses on finding neurophysiological evidence for the existence of collocations, i.e. sequences of two or more words where the words are statistically highly likely to occur together. There are a lot of different types of collocation, and the different types vary along the dimensions of fixedness and compositionality. Idioms, for example, are highly fixed in the sense that one word cannot typically be substituted for another word. They are also non-compositional, which means that the meaning of the expression cannot be derived from knowing the meaning of the component words.

Previous studies investigating the psychological validity of collocation have tended to focus on idioms and other highly fixed expressions. However, this massively limits the generalizability of the findings. In my research, I therefore use a much more fluid conceptualization of collocation, where sequences of words can be considered to be collocational even if they are not fixed, and even if the meaning of the expression is highly transparent. For example, the word pair clinical trials is a collocation, despite lacking the properties of fixedness and non-compositionality, because the word trials is highly likely to follow the word clinical. In this way, I focus on the transition probabilities between words; the transition probability of clinical trials (as measured in a corpus) is much higher than the transition probability of clinical devices, even though the latter word pair is completely acceptable in English, both in terms of meaning and grammar.

In my research, I extract collocational word pairs such as clinical trials from the written BNC1994. I then construct matched non-collocational word pairs such as clinical devices, embed the two sets of word pairs into corpus-derived sentences, and then ask participants to read these sentences on a computer screen while electrodes attached to their scalp detect some of their brain activity. This method of recording the electrical activity of the brain using scalp electrodes is known as electroencephalography, or EEG. More specifically, I use the event-related potential (ERP) technique of analysing brainwave data, where the brain activity is measured in response to a particular stimulus (in this case, collocational and non-collocational word pairs).

My PhD consisted of four ERP experiments. In the first two experiments, I investigated whether or not collocations and non-collocations are processed differently (at the neural level) by native speakers of English. In the third experiment, I did the same but with non-native speakers of English. Then, having found that there are indeed neurophysiological differences in the way that collocations and non-collocations are processed by both native and non-native speakers, I then conducted a fourth experiment to investigate which measures of collocation strength most closely correlate with the brain response. The results of this experiment have really important implications for the field of corpus linguistics, as I found that the two most widely-used measures of collocation strength (namely log-likelihood and mutual information) are actually the two that seem to have the least psychological validity.

The ESRC Postdoctoral Fellowship is unique in that, although it allows for the completion of additional research, the main focus is actually on disseminating the results of the PhD. Thus, during my year as an ESRC Postdoctoral Fellow, I intend to publish the results of my PhD research in high-impact journals in the fields of corpus linguistics and cognitive neuroscience. I will also present my findings at conferences in both of these fields, and I will attend training workshops in other neuroscientific methods.

The additional research that I intend to do during the term of the Fellowship will build upon my PhD work by using the ERP technique to investigate whether or not the neurophysiological difference in the processing of collocations vs. non-collocations is still apparent when the (non-)collocations contain intervening words. For instance, I want to find out whether or not the collocation take seriously is still recognized as such by the brain when there is one intervening word (e.g. take something seriously) or two intervening words (e.g. take the matter seriously), and so on.

Investigating the processing of these non-adjacent collocations is important for the development of linguistic theory. While my PhD thesis focused on word pairs rather than longer sequences of words in order to reduce the number of factors that might influence how the word sequences were processed, making it feasible to conduct controlled experiments, this is actually a very narrow way of conceptualizing the notion of collocation; in practice, words are considered to form collocations when they occur in one another’s vicinity even if there are several intervening words, and even if the words do not always occur in the same order. I will therefore use the results of this additional research to inform the design of research questions and methods for future work engaging with yet more varied types of collocational pattern. This will have important implications for our understanding of how language works in the mind.

I would like to conclude by expressing my gratitude to the ESRC for providing funding for this Fellowship. I am very grateful to be given this opportunity to disseminate the results of my PhD thesis, and I am very excited to carry out further research on the psychological validity of collocation.

Change of Leadership in CASS

Andrew Hardie is delighted to announce that he has handed over his role of CASS Centre Director to Elena Semino.

Elena has been Head of Department for Lancaster’s Department of Linguistics and English Language for 6 years, and has published widely in the areas of stylistics, metaphor theory, and medical humanities/health communication.

In Elena’s own words: 

‘It is a great honour and challenge to take over as CASS Director. Over the last four years, CASS has led the way nationally and internationally in the application of corpus methods to a wide range of social scientific problems, and has had a significant impact on research, policy and practice in many different contexts. I look forward to working with colleagues in Lancaster, and partners in the UK and around the world, to continue and extend this work in years to come.’

 

New CASS PhD student!

CASS is delighted to welcome new PhD student Andressa Gomide to the centre, where she will be working on data visualization in corpus linguistics. Continue reading to find out more about Andressa!


I am in the first year of a my PhD in Linguistics, which is focused on data visualizations for corpus tools. Being a research student at CASS, I am looking forward to gaining a better understanding of how different fields of study use corpus tools in their research.

IMG_4188

I’ve been involved with corpus linguistics since 2011, when I started my undergraduate research program on leaner corpora. Since then, I have developed a strong interest in corpus studies, which led me to devote my BA and my MA to this theme. I completed both my BA and my MA at the Universidade Federal de Minas Gerais in Brazil.

Aside from my interest in linguistics, I also enjoy outdoor activities such as cycling and hiking.

Birmingham ERP Boot Camp

Last week I attended a 5-day ERP Boot Camp at the University of Birmingham, and this was an incredible opportunity for me to learn from ERP experts and get specific advice for running my next ERP experiments. The workshop was led by two of the most renowned ERP researchers in the world, namely Professor Steven Luck and Dr Emily Kappenman. Luck and Kappenman are both part of the Centre for Mind and Brain at the University of California, Davis, which is one of the world’s leading centres for research into cognitive neuroscience. They are both among the set of researchers who set the publicationjen workshop blog 1 guidelines and recommendations for conducting EEG research (Keil et al. 2014), and Luck is also the developer of ERPLAB, which is a MATLAB Toolbox designed specifically for ERP data analysis. Moreover, Luck is the author of the authoritative book entitled An Introduction to the Event-Related Potential Technique. Before attending the ERP Boot Camp, most of the knowledge that I had about ERPs came from this book. Therefore, I am extremely grateful that I have had this opportunity to learn from the authorities in the field, especially since Luck and Kappenman bring the ERP Boot Camp to the University of Birmingham just once every three years.

There were two parts to the ERP Boot Camp: 2.5 days of lectures covering the theoretical aspects of ERP research (led by Steven Luck), and 2.5 days of practical workshops which involved demonstrations of the main data acquisition and analysis steps, followed by independent data analysis work using ERPLAB (led by Emily Kappenman). Day 1 of the Boot Camp provided an overview of different experimental paradigms and different ERP components, which are defined as voltage changes that reflect a particular neural or psychological process (e.g. the N400 component reflects the processing of meaning and the P600 component reflects the processing of structure). Most of the electrical activity in the brain that can be detected by scalp electrodes comes from the surface of the cortex but, in the lecture on ERP components, I was amazed to find out that there are some ERP components that actually reflect brain stem activity. These components are known as auditory brainstem responses. I also learnt about how individual differences between participants are typically the result of differences in cortical folding and differences in skull thickness, rather than reflecting any functional differences, and I learnt how ERP components from one domain such as language can be used to illuminate psychological processes in other domains such as memory. From this first day at the Boot Camp, I started to gain a much deeper conceptual understanding of the theoretical basis of ERP research, causing me to think of questions that hadn’t even occurred to me before.

Day 2 of the Boot Camp covered the principles of electricity and magnetism, the practical steps involved in processing an EEG dataset, and the most effective ways of circumventing and minimizing the problems that are inevitably faced by all ERP researchers. On this day I also learnt the importance of taking ERP measurements from difference waves rather than from the raw ERP waveforms. This is invaluable knowledge to have when analysing the data from my next experiments. In addition, I gained some concrete advice on stimulus presentation which I will take into account when editing my stimuli.

On day 3 of the Boot Camp, we were shown examples of ‘bad’ experimental designs and we were asked to identify the factors that made them problematic. Similarly, we discussed how to identify problematic results just by looking at the waveforms. These was really useful exercises in helping me to critically evaluate ERP studies, which will be useful both when reading published articles and when thinking about my own experimental design.

From the outset of the Boot Camp, we were encouraged to ask questions at any time, andJen workshop blog 2 this was particularly useful when it came to the practical sessions as we were able to use our own data and ask specific questions relating to our own experiments. I came prepared with questions that I had wanted to know the answers to for a long time, as well as additional questions that I had thought of throughout the Boot Camp, and I was given clear answers to every one of these questions.

Furthermore, as well as acquiring both theoretical and practical knowledge from the scheduled lectures and workshops, I also gained a lot from talking to the other ERP researchers who were attending the Boot Camp. A large proportion of attendees focused on language as their main research area, while others focused on clinical psychology or other areas of psychology such as memory or perception. I found it really interesting to hear the differences of opinion between those who were primarily linguists and those who were primarily psychologists. For instance, when discussing the word-by-word presentation of sentences in ERP experiments, the psychologists stated that each word should immediately replace the previous word, whereas the linguists concluded that it is best to present a blank white screen between each word. Conversations such as this made it very apparent that many of the aspects of ERP research are not standardised, and so it is up to the researcher to decide what is best for their experiment based on what is known about ERPs and what is conventional in their particular area of research.

Attending this ERP Boot Camp was a fantastic opportunity to learn from some of the best ERP researchers in the world. I now have a much more thorough understanding of the theoretical basis of ERP research, and I have an extensive list of practical suggestions that I can apply to my next experiments. I thoroughly enjoyed every aspect of the workshop and I am very grateful to CASS for funding the trip.

Participants needed for psycholinguistic experiment!

My PhD research combines methods from corpus linguistics and psychology in order to find out more about how language is processed in the brain. The method that I use from psychology is known as electroencephalography (EEG), and this involves placing electrodes across a participant’s scalp in order to detect some of the electrical activity of the brain. More specifically, I use the event-related potential (ERP) technique, which involves measuring the electrical activity of the brain in response to particular stimuli. When I carried out my pilot study earlier this year, this was the first time the EEG/ERP method had been used in the Department of Linguistics and Language, making it a really exciting project to get involved with.

Having completed my pilot study and obtained some really interesting results, I have refined my methods and hypotheses and I am now ready to recruit participants for my next two experiments. For one experiment which will take place in late August, I am looking for 16 native speakers of Mandarin Chinese; for another experiment which will take place in October, I am looking for 16 native speakers of English. I would really appreciate hearing from anyone who is interested in taking part! The whole procedure takes about 1 hour; it takes about 20-30 minutes for me to attach all of the electrodes, and the experiment itself takes an additional 20-30 minutes.

If you do decide to take part, you will wear a headcap containing 64 plastic electrode holders which the electrodes are clipped into, as well as 6 electrodes around your eyes and 2 electrodes behind your ears. The electrodes make contact with your skin via a conductive gel which enables some of the electrical signals in your brain to propagate to the electrode wires and into the AD-box, where the electrical signal is amplified and converted from analog to digital format. The amplified signals are then transmitted to the USB2 receiver via a fibre-optic cable, before being relayed onto the data acquisition computer where your brainwaves can be viewed as a continuous waveform. Before starting the experiment, I will ask you to blink, clench your teeth, and move your head from left to right so that you can see how these movements affect the observed waveform.

jen expermient

The experiment itself involves reading real language data that has been extracted from the British National Corpus. This consists of sentences which are presented word-by-word on a computer screen. After reading each sentence, you will be asked to respond to a true/false statement based on the sentence that you have just read.

Before conducting my pilot study, I carried out a number of test-runs on other postgraduate students and each one of them found it to be a really interesting experience. For instance, Gillian Smith, another PhD research student in CASS, agreed to take part in one of my test-runs and here she describes her experience as a participant:

“Getting to be involved in Jen’s experiment was a great opportunity! Having never participated in such a study before, I found the whole process (which Jen explained extremely well) very interesting. I particularly enjoyed being able to look at my brainwaves after, which is something I have never experienced. Likewise, having electrodes on my head was a lovely novelty.”

gill jen experiment


I would really like to hear from any native speakers of Mandarin Chinese or native speakers of English who would be interested in taking part in one of these experiments. Please email j.j.hughes@lancaster.ac.uk to express interest and to receive more information.

Corpus Data and Psycholinguistics Seminar

On the afternoon of Thursday 19th May 2016, CASS held its first ever psycholinguistics seminar which brought together researchers from both linguistics and psychology. The theme of the seminar was “Corpus Data and Psycholinguistics”, with a particular focus on experimental psycholinguistics.

The afternoon consisted of four 40-minute presentations which covered a range of different experimental methods including eye-tracking and EEG. Interestingly, the notion of collocation also emerged as a strong theme throughout the presentations. Different types of collocation were addressed, including bigrams, idioms, and compounds, and this prompted thought-provoking discussions about the nature of collocation and the relationship between psycholinguistic results and the different statistical measures of collocation strength.

The first presentation was delivered by Professor Padraic Monaghan from the Psychology Department at Lancaster University. In this presentation, Padraic provided an engaging introduction to computational modelling in psycholinguistics, focusing mainly on connectionist models where the input determines the structure of processing. This talk prompted a particularly interesting observation about the relationship between connectionist models and parts-of-speech tags in corpora.

In the second presentation, Dr Phil Durrant from the University of Exeter provided a critical perspective on his own earlier work into whether or not psycholinguistic priming is evident in collocations at different levels of frequency, and on the distinction between the related notions of collocation and psychological association. This presentation also provided a really interesting insight into the different ways in which corpus linguistics and psychological experimentation can be combined in psycholinguistic studies. This really helped to contextualise the studies reported in the other presentations within the field of psycholinguistics.

After a short break, I presented the results of the first of several studies which will make up my PhD thesis. This initial study pilots a procedure for using EEG to determine whether or not the brain is sensitive to the transition probabilities between words. This was an excellent opportunity for me to gain feedback on my work and I really appreciate the input and suggestions for further reading that I received from participants at this event.

The final presentation of the afternoon was delivered by Professor Michaela Mahlberg and Dr Gareth Carroll from the University of Birmingham. This presentation drew upon eye-tracking data from a study exploring literary reading in order to pinpoint the methodological issues associated with combining eye-tracking techniques with literary corpora, and with corpus data more generally.

With such an interesting series of talks sharing the theme of “Corpus Data and Psycholinguistics”, the CASS psycholinguistics seminar proved to be a very successful event. We would like to thank the presenters and all of the participants who attended the seminar for their contribution to the discussions, and we are really looking forward to hosting similar seminars in the near future.

Upcoming CASS Psycholinguistics Seminar

CASS is excited to announce an upcoming half-day research seminar on the theme of “Corpus Data and Psycholinguistics”. The event will take place on Thursday 19th May 2016 at 1-5pm in Furness Lecture Theatre 3.

The aim of the event is to bring together researchers with an interest in combining methods from corpus linguistics and psycholinguistics. In particular, there will be a focus on experimental psycholinguistics. It is set to be an exciting afternoon consisting of four 40-minute presentations from both internal and external speakers. Professor Padraic Monaghan from the Department of Psychology will be giving an introduction to computational modelling in psycholinguistics, and I will be presenting my work on investigating the processing of collocation using EEG. Furthermore, Dr Phil Durrant from the University of Exeter will be giving a talk entitled “Revisiting collocational priming”, and Professor Michaela Mahlberg from the University of Birmingham will be discussing the methodological issues associated with combining eye-tracking techniques with corpus data.

You can find out more about these talks from the abstracts below.


Padraic Monaghan, Lancaster University

Computational modelling of corpus data in psycholinguistic studies

Computational models of language learning and processing enable us to determine the inherent structure present in language input, and also the cognitive mechanisms that react to this structure. I will give an introduction to computational models used in psycholinguistic studies, with a particular focus on connectionist models where the structure of processing is derived principally from the structure of the input to the model.


Phil Durrant, University of Exeter

Revisiting collocational priming

Durrant & Doherty (2010) evaluated whether collocations at different levels of frequency exhibit psycholinguistic priming. It also attempted to untangle collocation from the related phenomenon of psychological association by comparing collocations which were and were not associates. Priming was found between high-frequency collocations but associated collocates appeared to exhibit more deep-rooted priming (as reflected in a task designed to reflect automatic, rather than strategic processes) than those which were not associated. This presentation will critically review the 2010 paper in light of more recent work. It will re-evaluate the study itself and suggest ways in which research could be taken forward.

Durrant, P., & Doherty, A. (2010). Are high-frequency collocations psychologically real? Investigating the thesis of collocational priming. Corpus linguistics and linguistic theory, 6(2), 125-155.


Jennifer Hughes, Lancaster University

Investigating the processing of collocation using EEG: A pilot study

In this presentation, I discuss the results of an EEG experiment which pilots a procedure for determining whether or not there is a quantitively distinct brain response to the processing of collocational bigrams compared to non-collocational bigrams. Collocational bigrams are defined as adjacent word pairs which have a high forward transitional probability in the BNC (e.g. crucial point), while non-collocational bigrams are defined as adjacent word pairs which are semantically plausible but are absent from the BNC (e.g. crucial night). The results show that there is a neurophysiological difference in how collocational bigrams and non-collocations bigrams are processed.


Michaela Mahlberg, Kathy Conklin, and Gareth Carrol, University of Birmingham

Exploring corpus-attested patterns in Dickens’s fiction – methodological challenges of using eye-tracking techniques

The study of the relationship between patterns and meanings is a key concern in corpus linguistics. The data that corpus linguists work with, however, only provides a partial picture. In this paper, we will look at how questions of frequencies in corpora can be related to questions raised by data from eye-tracking studies on reading times. We will also discuss challenges of designing experiments to address these questions. As a case study, we focus on examples of patterns identified in Dickens’s fiction, but the methodological issues we address have wider implications beyond the study of literary corpora.


The event is free to attend and is open to both internal and external attendees. If you are an external guest, please email j.j.hughes@lancaster.ac.uk so we know that you intend to come.

We are really looking forward to this event as it will be an exciting opportunity to share ideas regarding the different approaches to using corpus data in experimental psycholinguistics.

Participants needed for EEG experiment!

For my PhD I am trying to find out how language is processed in the brain by combining methods from corpus linguistics and psycholinguistics. Specifically, I have extracted real language data from the British National Corpus and modified this data so that it can be presented to participants in an electroencephalography (EEG) experiment. In EEG experiments, electrodes are placed on a participant’s head and these electrodes detect some of the electrical activity that occurs in the participant’s brain in response to particular stimuli. EEG experiments are frequently conducted in Lancaster’s Psychology Department but they have not yet been conducted in the Department of Linguistics and English Language, so it’s really exciting to try out this method which is new to the department.

When conducting an EEG experiment, I start by taking head measurements and then placing a headcap on the participant’s head. This headcap contains 64 electrode holders which I fill with conductive gel before placing an electrode into each one. I also attach some additional electrodes behind the ears and around the eyes. Once all of the electrodes are in place, the stimuli is displayed to the participant on a computer screen. This stimuli consists of sentences that are presented word-by-word, as well as true/false statements that are presented as whole sentences. Participants just need to read the word-by-word sentences and respond to the true/false statement by pressing either the ‘T’ or the ‘F’ key on the keyboard. While they’re doing this, the electrodes detect some of the electrical activity that is happening in the brain, and this information is sent to another computer which displays the electrical activity as a continuous waveform. The setup of the experiment can be seen in the diagram below.

Jen experiment

 

 

 

 

 

Throughout my PhD I will be conducting a series of experiments starting with a pilot study. In my pilot study, the experiment itself lasts for just 10 minutes but it can take me up to an hour to attach all of the electrodes. This preparation time should decrease as I carry it out on more and more participants.

I have already conducted several practice runs of my experiment with other postgraduate students. For example, Gillian Smith, another PhD research student in CASS, agreed to take part in one of my practice runs and here she describes her experience as a participant:

Jen experiment Gill

 

“Getting to be involved in Jen’s experiment was a great opportunity! Having never participated in such a study before, I found the whole process (which Jen explained extremely well) very interesting. I particularly enjoyed being able to look at my brainwaves after, which is something I have never experienced. Likewise, having electrodes on my head was a lovely novelty.”

 

 

I am currently looking for 15 native speakers of English to take part in my pilot study.

If you are interested in taking part in this experiment please email j.j.hughes@lancaster.ac.uk for more information.

CASS PhD student in Moscow to attend the XVI April International Academic Conference on Economic and Social Development

I recently got the opportunity to travel to Moscow to attend the XVI April International Academic Conference on Economic and Social Development at the National Research University – Higher School of Economics (HSE). This conference covered a wide variety of fields including Sociology, Geography, and Technology, and, on the last day of the conference, there was a seminar specifically for Linguistics PhD students. The aim of this seminar was to allow students from Russia and other countries to exchange ideas, and to introduce students from around the world to HSE.

At the seminar, there were presentations from 10 PhD students and these covered a variety of Linguistics topics including Grammar, Semantics, Sign Language, and Cognitive Linguistics. There were also some presentations on Corpus Linguistics: one which discussed semantic role labelling for the Russian language based on the Russian FrameBank, and another which discussed building a corpus of Soviet poetry. I found it interesting to see corpus analyses based on the Russian language, and it was also interesting to see the use of the ‘web as corpus’. This introduced me to tools that I haven’t used before, such as the Google N-Gram Viewer.

In the afternoon, I gave a presentation entitled The collocation hypothesis: Evidence from self-paced reading. This was the first time I had ever given a conference presentation and I was really pleased to have an audience that seemed interested in my work. The audience was composed of PhD students, some undergraduate students from the Linguistics Department at HSE, researchers from other fields who had presented at the conference on the previous days, as well as a few senior academics who gave me some really useful feedback.

The conference was held at the central building of HSE and, the day before the seminar, an MA student in Computational Linguistics kindly gave me a tour of the Linguistics Department. It was interesting to see that their classes are all seminar-based and I particularly liked the way they had a common room where all members of the department, including undergraduates, postgraduates, and lecturers, go between classes in order to socialise or do work. Here, I got the chance to speak to some undergraduates and postgraduates and I was shown some of the corpora that were compiled at that department, such as the Corpus of Modern Yiddish, the Bashkir Poetic Corpus, and the Russian Learner Corpus of Academic Writing. I was also told about a project called Tolstoy Digital, which involved making a corpus of Tolstoy’s works. It was interesting to hear about the unique problems that were faced when compiling this corpus. For instance, Tolstoy used an older orthography so this had to be translated to the modern form before the corpus could be tagged and parsed.

When speaking to members of the department, it was also interesting to discuss how some of their work links to some of the work carried out at CASS and the Linguistics Department at Lancaster University. For example, Elena Semino’s work on pain questionnaires seemed to link closely to an article written by members of HSE entitled Towards a typology of pain predicates (Reznikova et al. 2012). This article discusses the way in which the semantic domain of pain is largely composed of words borrowed from other semantic domains.

After showing me around the department, the MA student, Natalia, showed me around some of the main sights in central Moscow. I really appreciated this as I got to see some of Moscow from a local’s perspective as well as getting to visit some of the key sights that I was looking forward to seeing such as the Bolshoi Theatre. Whilst in Moscow, I also went to see Swan Lake at the Kremlin Theatre of Classical Russian Ballet. This was an amazing experience because I had always wanted to see a Russian ballet and, although I had already seen Swan Lake several times, this was definitely the best version I had ever seen. Overall I had a brilliant time in Moscow and I am really grateful for the Higher School of Economics for funding and organising the trip.