My experience with the Corpus MOOC

Lancaster University’s MOOC in Corpus Linguistics has been hugely important to me during my doctoral research and I’ve taken it each year since it was first offered in 2014. This is not because I’m an especially slow learner or that I was unsuccessful in all of my previous attempts – it’s because the course has so much to offer that it’s impossible to appreciate all of the different aspects in one go; it requires repeated visits as understanding deepens and new questions emerge.

When I first took the course, I knew nothing about corpus linguistics (or MOOCs for that matter) and went through each week’s materials at a very introductory level, trying to get a handle on the principles and terminology and also learning about tools and techniques. At first, I was apprehensive, ready to bail at any sign of discomfort, but I found the lectures not only easy to follow, but also thoroughly enjoyable and endlessly fascinating. I was hooked! Although the course itself spanned eight weeks, the materials were available on the website long after the course was over. This allowed me to revisit and review tutorials whenever I felt unsure about something, and also start to focus on areas that aligned with my own research interests.

The following year, with the basics under my belt, I decided to take the course for a second time with the intention of tackling the content in more depth and also using my own data for the tutorials. What I found was that the multiple layers of the course became extremely valuable as I became more comfortable with different concepts and research in the field and also that my approach to the course had changed. Instead of following the course week by week as I had done the first time, I started to pick and choose different aspects that matched particular stages of my own research.

The third time I took the course, I was driven by an interest in the advanced materials as well as the discussions and comments by other students and mentors. I had so many questions arising from my own research that I felt it would be helpful to hear what others had to say about their own. The forum became an incredibly valuable resource and one that I had not appreciated as a beginner. It is extremely generous of Lancaster to offer such a fantastic course with all the support, resources, knowledge and materials and ask for nothing in return.

And now, even though I’ve completed my doctoral research, I’ve registered for the course for the fourth time. It is such an incredibly diverse and fascinating course, with so many layers and areas of interest, that there is still a great deal for me to learn. And the numerous scholars discussing their research have an enthusiasm and passion for their work that is both infectious and inspirational. Perhaps my husband is right, I’ve become addicted to this MOOC!


The next Corpus MOOC starts 25 September 2017. You can register for free at https://www.futurelearn.com/courses/corpus-linguistics.

The course is intended for anyone interested in quantitative language analysis – no prior knowledge of linguistics or corpora is required.

Would you like to share your experience of the Corpus MOOC? Include #CorpusMOOC in your tweets or other social media posts or get in touch via v.brezina(Replace this parenthesis with the @ sign)lancaster.ac.uk.

CASS PhD Student Tanjun Liu wins Best Poster Award at EUROCALL2017

In late August, I attended the 25th annual conference of EUROCALL (European Association for Computer Assisted Language Learning) at the University of Southampton. This year’s theme encompassed how Computer-Assisted Language Learning (CALL) responds to changing global circumstances, which impact on education. Over 240 sessions were presented covering the topics of computer mediated communication, MOOCs, social networking, corpora, European projects, teacher education, etc.

 

 At this conference, I presented a poster entitled “Evaluating the effect of data-driven learning (DDL) on the acquisition of academic collocations by advanced Chinese learners of English”. DDL is a term created by Tim Johns in 1991 to refer to the use of authentic corpus data to conduct student-centred discovery learning activities. However, even though many corpus-based studies in the pedagogical domain have suggested applying corpora in the domain of classroom teaching, DDL has not become the mainstream teaching practice to date. Therefore, my research sets out to examine the contribution of DDL to the acquisition of academic collocation in the Chinese university context.

 

The corpus tool that I used in my research was #LancsBox (http://corpora.lancs.ac.uk/lancsbox/), which is a newly-developed corpus tool at CASS that has the capacity to create collocational networks, i.e. GraphColl. The poster I presented was a five-week pilot study of my research, the results of which show that the learners’ attitudes towards using #LancsBox were mostly positive, but there were no statistically significant differences between using the corpus tool and online collocations dictionary, which may be largely due to very short intervention time in the pilot study. My poster also presented the description of the forthcoming main study that will involve longer exposure and more EFL learners.

 

At this conference I was fortunate enough to win the EUROCALL2017 Best Poster Award (PhD), which was given to the best poster presented by a PhD student as nominated by conference delegates. Thank you to all of the delegates who voted for me to win this award and it was a real pleasure to attend such a wonderful conference!

CASS at Corpus Linguistics 2017

The biennial Corpus Linguistics conference first took place in 2001 at Lancaster, with the 2017 conference at Birmingham being its 9th outing. Lasting four days with an additional day for workshops, this blog post details CASS participation at the event.

On Monday 24th July CASS ran two pre-conference workshops: Vaclav Brezina and Matt Timperley’s workshop was based around the tool #LancsBox which has the capacity to create collocational networks while Robbie Love and Andrew Hardie introduced the Spoken BNC2014 Corpus. Pre-conference workshop presentations were also given by CASS members in the Corpus Approaches to Health Communication workshop which saw talks by Paul Baker (on NHS patient feedback), Elena Semino (on assessment of a diagnostic pain questionnaire) and Karen Kinloch who gave two talks on discourses around IVF treatment and post natal depression (her second talk was co-presented with Sylvia Jaworska).

On the first day of the conference proper, CASS Director Andrew Hardie gave a plenary entitled Exploratory analysis of word frequencies across corpus texts: towards a critical contrast of approaches, which involved a “for one night only” Topic Modelling analysis, demonstrating some of the problems and assumptions behind this approach. Key points were illustrated with a friendly picture of a Gigantopithecus (pictures of dinosaurs and other extinct creatures were used in several talks, perhaps suggesting a new theme for CL research). The plenary can be watched in full here. https://www.youtube.com/watch?v=ka4yDJLtSSc

A number of conference talks involved the creation and analysis of the new 2014 British National Corpus, with Abi Hawtin presenting on how she developed parameters for the written section and Robbie Love discussing swearing in the spoken section of the BNC2014. Vaclav Brezina and Matt Timperley discussed a proposal for standardised tokenization and word counting, using the new BNC as an exemplar while Susan Reichelt examined ways of adapting the BNC for sociolinguistic research, taking a case study on negative concord.

In terms of other corpus creation projects, Paul Rayson, Scott Piao and a team from Cardiff University discussed the creation of a Welsh Semantic tagger for use with the CorCenCC Project.

Two talks involved uses of corpus linguistics in teaching. First, Gillian Smith described the creation and analysis of a corpus of interactions in Special Education Needs classrooms, with the goal of investigating teacher scaffolding while Liam Blything, Kate Cain and Andrew Hardie analysed a half million corpus of teacher-child interactions during guided reading sessions.

Regarding work examining discourse and representation using corpus approaches Carmen Dayrell presented her work with Helen Baker and Tony McEnery on a diachronic analysis of newspaper articles about droughts, their research combining corpus approaches with GIS (Geographical Information Systems). GIS was also used by Laura Paterson and Ian Gregory to map text analysis of poverty in the UK while Paul Baker and Mark McGlashan reported on their work looking at representations of Romanians in the Daily Express, comparing articles with online reader comments. A fourth paper by Jens Zinn and Daniel McDonald considered changing understandings around the concept of risk in English language newspapers.

Collocation was also a popular CASS topic in our presentations. Native and non-native processing of collocations was investigated by Jen Hughes, who carried out an experimental study using electroencephalography (EEG) which measures electrical potentials in the brain, while another approach to collocation was taken by Doğuş Can Öksüz and Vaclav Brezina who examined adjective-noun collocations in Turkish and English. A third collocation study by Dana Gabasolva, Vaclav Brezina and Tony McEnery involved empirical validation of MI-based score collocations for language learning research.

Finally, Jonathan Culpeper and Amelia Joulain-Jay talked about an affiliated CASS project involving work on creating an Encyclopaedia of Shakespeare’s language. They discussed issues surrounding spelling variation, and part of speech tagging, and gave two case studies (involving the words I and good).

 
The conference brought together corpus linguists from dozens of countries (including Germany, Finland, Spain, Israel, Japan, Brazil, Iran, The Netherlands, USA, New Zealand, Taiwan, Ireland, China, Czech Republic, Italy, Sweden, Poland, Chile, UK, Hong Kong, Norway, Australia, Belgium, Canada, South Africa and Venezuela) and was a great opportunity to share and hear about developing work in the field. There was a lively twitter presence throughout the conference, with the tag #CL2017bham. However, my favourite tag was #HardiePieChartWatch, which had me going back to my slides to see if I had used a pie chart appropriately. Be careful with your pie charts!

The next conference will be held (for the first time) in Cardiff – I hope to see you there in two years.

More pictures of the conference can be found at https://www.flickr.com/photos/artsatbirmingham/sets/72157684181373191

How to Produce Vocabulary Lists

As part of the Forum discussion in Applied Linguistics, we have formulated some basic principles of corpus-based vocabulary studies and pedagogical wordlist creation and use. These principles can be summarised as follows:

  1. Explicitly define the vocabulary construct.
  2. Operationalize the vocabulary construct using transparent and replicable criteria.
  3. If using corpora, take corpus evidence seriously and avoid cherry-picking.
  4. Use multiple sources of evidence to test the validity of the vocabulary construct.
  5. Do not rely on your intuition/experience to determine what is useful for learners; collect evidence about learner needs to evaluate the usefulness of the list.
  6. Do not present learners with a decontextualized list of lexical items; use/create contextualized materials instead.

To find out more, you can read:

Brezina, V. & Gablasova, D. (2017). How to Produce Vocabulary Lists? Issues of Definition, Selection and Pedagogical Aims. A Response to Gabriele Stein. Applied Linguistics, doi:10.1093/applin/amx022.

CASS Guided Reading project presented to The Society for the Scientific Studies of Reading (SSSR)

In mid-July, it was my pleasure to represent CASS at the SSSR conference in Novia Scotia, Canada! Over 400 professionals attended, including language and literacy researchers, school teachers, and speech and language therapists.

My primary aim was to demonstrate how our CASS language development project is using corpus search methods to identify the effectiveness of teacher strategies that are being used in guided reading classroom interactions (also see part 1 & part 2 of my project introduction blogs). The best opportunity for this was during my poster presentation, which highlighted our first round of findings on the types of questions that teachers ask children.

We first demonstrated that teachers are paying attention to recommended guidelines to ask a lot of wh-questions (why, how, what, when etc): wh-questions typically take up around 20% of the total questions being asked in normal adult conversation, but took up 40% of the total questions asked by teachers in our spoken classroom interactions.

Second, the poster presents initial findings on our developmental question of whether teachers of older children ask more challenging question types than teachers of younger children. However, our chosen categories of question type (thus far) were used equivalently across year groups, so this prompts a follow up to examine whether finer categories of question type differ in their proportion of usage across year groups.

Third, the poster reported that teachers at schools in low socio-economic-status (SES) regions asked a higher proportion of wh-questions than teachers at schools in high SES regions. Most viewers of the poster agreed that this prompts us to look at children’s responses: the high proportion of wh-questions asked by teachers at schools in low SES regions might be shaped by less engaged answers from low SES children that require more follow up wh-questions relative to the typically more engaged answers provided by high SES children.

Although there were a number of other posters throughout the week that examined classroom interactions, none had taken advantage of the precise, fast and reliable search methods that we are using. Therefore, attendees were very impressed by how we have been able to interrogate our large corpus without being restricted by the amount of manual hand coding that can be achieved within a realistic time window.

Finally, a big thanks to CASS and SSSR for making this visit possible. As well as the incredible learning opportunities provided by the wide range of high quality presentations on reading research, I also had a good time meeting the fun and interesting conference attendees  – and local Canadians too! Novia Scotia is a beautiful place to visit, with a very friendly and youthful demographic.


Liam will be presenting a talk on this project at the Corpus Linguistics 2017 conference on Wednesday 26th July at 4pm in Lecture Theatre 117, Physics Building, University of Birmingham. For updates, watch this space and twitter @CorpusSocialSci @LiamBlything

 

 

User Involvement: CASS go to CLARIN PLUS workshop

At the beginning of June, I attended the CLARIN PLUS workshop on User Involvement held in the capital Helsinki. CLARIN stands for “Common Language Resources and Technology Infrastructure”; it is an international research infrastructure which provides scholars in the social sciences and humanities with easy access to digital language data, and also advanced tools to handle those data sets. The main purpose of the workshop was to share information, good practice, expertise, and ideas on how potential and current users can most benefit from CLARIN services.

I was representing Lancaster University as part of the UK branch of CLARIN, which is led by Martin Wynne at Oxford. Some of the participants, representing CLARIN’s different national consortia, shared their successful stories of their involvement with the local community.

At the workshop, Johanna Berg, from Sweden, and Mietta Lennes, from Finland showed us how they made innovative use of the roadshow event format to present some language resources across different institutions in their countries. Mietta also gave us a taste of the very useful tools and corpora that you can find at The Language Bank of Finland.

Another fruitful example presented at the workshop was the Helsinki Digital Humanities Hackathons. The event, which is in its third edition, brings together researchers from computer science, humanities and social sciences for a week of intensive work sharing a diversity of skills. Eetu Mäkelä, one of the organisers of the DHH, demonstrated that it is possible to engage researchers from very different backgrounds and have them working in a complementary way. The impressive results of last year’s edition can be checked out at the DHH16 website.

At the end of two profitable days, Darja Fišer, director of CLARIN-ERIC User Involvement, wrapped up the event by presenting other amazing experiences across several institutions connected to CLARIN. One of the success stories she mentioned was the Corpus Linguistics: Method, Analysis, Interpretation MOOC offered by CASS, which will be running again in Autumn this year (you can register your interest here!). Darja also highlighted the importance of events such as summer schools to reach out to more users. Indeed, Darja shared some incredible resources and insightful ideas at our recent Summer Schools in Corpus Linguistics and other Digital methods (#LancsSS17). Make sure you read our next blog post for a summary of the summer school week!

Spoken BNC2014 Symposium

On the afternoon of Monday 26th June, CASS hosted a special symposium to celebrate the upcoming public launch of the Spoken British National Corpus 2014 – a corpus which members of CASS and Cambridge University Press have spent the last three years compiling.

More than fifty guests attended, representing a mixture of Lancaster Summer Schools participants, members of the CASS Challenge Panel, and those who travelled to Lancaster just for the day.

To kick off the symposium, CASS Centre Director Andrew Hardie said a few words about the history of Corpus Linguistics at Lancaster University, and put the compilation of a new BNC into context against previous developments in the field. He expressed his delight at the interest in the Spoken BNC2014 project as evidenced by the number of guests who were in attendance for the symposium.

I then gave the first talk alongside Claire Dembry (from Cambridge University Press) and Andrew Hardie, as representatives of the Spoken BNC2014 research team which also includes Vaclav Brezina and Tony McEnery. We discussed the main methodological decisions we made when thinking about the design, data collection, transcription and processing of the corpus. Andrew then gave a quick demonstration of the corpus in CQPweb, showing how features including speaker IDs, overlaps and attribution confidence are displayed in the interface.

Following our talk came the first of four research presentations, all of which used (the early access subset of) the Spoken BNC2014. The first of these was a talk by Karin Aijmer (University of Gothenburg) about the intensifier fucking, which went down very well with the audience. Karin’s Spoken BNC2014 research, which also includes other intensifiers, will be published as a chapter in Brezina et al. (forthcoming).

After a short break for refreshments, Jacqueline Laws (University of Reading) presented research into verb-forming suffixation which she had undertaken with Chris Ryder and Sylvia Jaworska. Comparing the demographically-sampled component of the Spoken BNC1994 to the new Spoken BNC2014, she found that females now appear to produce more neologisms (e.g. favouritize, popify) compared to males. Laws et al.’s research will be published in a forthcoming special issue of the International Journal of Corpus Linguistics.

Susan Reichelt (Lancaster University) was next to present her work on producing sociolinguistically comparable subsets of both the original and new Spoken British National Corpora. She highlighted a point which I had touched upon in my earlier talk: that the compilation of the Spoken BNC2014 sought to strike a balance between direct comparability with the original corpus on the one hand, and methodological improvement on the other. The areas where improvement was favoured over comparability (e.g. the classification of speaker socio-economic status) ought to be considered especially when thinking about sociolinguistic analysis. Susan’s work is associated with the recently announced CASS SDA project.

Finally, Jonathan Culpeper and Mathew Gillings (Lancaster University) presented their work on politeness variation between the north and south of England. They aimed to assess the extent to which commonly held stereotypes about differences between northern and southern politeness were reflected in language use in both the original and new corpora as a single dataset. Their work will be published as a chapter in Brezina et al. (forthcoming).

My reaction as the organiser of the symposium was that there is definitely a sense of anticipation about the release of the Spoken BNC2014, which is planned to take place in the autumn. Furthermore it was lovely to meet so many friendly and enthusiastic attendees. I am very grateful to each of the speakers for giving such interesting talks, and to all who attended – especially those who tweeted their reactions to the talks using the #BNC2014 hashtag! As one of my final duties as a member of CASS before moving onto pastures new, I am very glad that the symposium went as well as it did.

More on drought: The ENDOWS project

We are thrilled to announce that our latest bid was successful – ENDOWS: ENgaging diverse stakeholders and publics with outputs from the UK DrOught and Water Scarcity programme.

The ENDOWS project will capitalise on the outcomes from four existing projects within NERC’s UK Drought and Water Scarcity programme (Historic Droughts, DRY, MaRIUS and IMPETUS) to maximise impact. ENDOWS will exploit the synergies between these projects, promote active interaction among disciplines, and develop close collaboration with a diverse range of stakeholders (policy makers, water companies, NGOs and community leaders). The project therefore opens up the possibility of genuinely enhancing and innovating the UK planning and management of future drought events.

Funded by NERC, this is a two-year project which brings together a multi-disciplinary team of 37 researchers from 12 UK universities or research centres and Climate Outreach, one of Europe’s leading voices on public engagement:

  • Centre for Ecology and Hydrology (CEH)
  • University of West England
  • University of Oxford
  • Cranfield University
  • The University of Reading
  • University of Bristol
  • British Geological Survey (BGS)
  • Sheffield University
  • Harper Adams University
  • University of Exeter
  • Lancaster University
  • Loughborough University
  • University of Warwick

Carmen Dayrell is the Co-Investigator at Lancaster University. She has been working with Tony McEnery and Helen Baker as part of the CASS team within the Historic Droughts project. CASS is examining how British newspapers have debated drought and water scarcity events in UK, covering 200 years of discourse: 1850 to 2014. The analysis uses an innovative methodological approach which combines Critical Discourse Analysis with methods from Corpus Linguistics and GIS (Geographic Information Systems), enabling the researchers to examine the link between textual patterns and geographic references and hence explore geographically bounded discourses.

To illustrate the interesting results that this type of analysis can yield, let’s have a look at the places appearing around the word “drought” in newspaper texts published between 2010 and 2012. These were years when England and Wales were hit hard by drought. The bigger the dot in the maps, the higher the number of mentions.

 

Figure 1: 2010-2012 drought in Britain (tabloid)

 

 

Figure 2: 2010-2012 drought in Britain (broadsheet)

 

A closer reading of texts unveils interesting patterns:

  • The press does not always specify the specific locations impacted by the drought. Britain and England were by far the places most frequently mentioned.
  • When mentioning specific locations, these were usually in England.

Large areas of Britain face drought conditions, the Environment Agency said. Parts of the Midlands and Yorkshire are expected to be declared high risk in the agency’s drought prospects report.

The Telegraph 11/03/2012

  • Rather than impacted by the drought, Scotland was portrayed as the solution for the problem since it is rich in water resources.

SCOTLAND yesterday offered to provide water to drought-hit Southern England. Infrastructure Secretary Alex Neil said it was “only right” to offer some of Scotland’s “plentiful supply of water “. 

The Express, 10/03/2012

  • The newspapers also report on actions taken to address the problem of drought. Applications for drought orders and permits and the introduction of hosepipe bans were the most frequently mentioned.

The remaining area in drought is South and West of a line from Lincolnshire to Sussex, taking in Oxfordshire, where hosepipe bans imposed by seven water companies remain in place.

The Independent, 19/05/2012

  • There were also mentions of the impact of the drought. These mainly related to: (i) wildlife and plants/gardens being affected and (ii) water levels of rivers and reservoirs going low.

SCIENTISTS fear rare eel species could be completely wiped out because of drought in the South of England.

The Daily Record, 17/07/2011

River levels are as low as in 1976 after another very dry week across England and Wales, the Environment Agency said. In its latest drought briefing yesterday, the Government agency said all areas had seen less than 1mm of rain.

The Herald, 31/03/2012

By examining 200 years of newspaper discourse, the analysis can trace repeated patterns and changes across time. This in turn can inform ways of thinking about how the media representation of drought has influenced the way in which the British public perceives and responds to drought events. Thus, the newspaper analysis will contribute to fostering more informed dialogues between policy makers, water companies, and community leaders and the general public.

CASS go to ICAME38!

Researchers from CASS recently attended the ICAME38 conference at Charles University in Prague. Luckily, we arrived in Prague a day early which gave us plenty of time to explore the city. The weather was sunny, so we walked to Wenceslas Square, and then took the lift to the top of the Old Town Hall Tower to enjoy the views over the city.

The following day, it was time to begin the conference! Over the course of the event, seven CASS members presented their research (you can view full abstracts of all talks here). Up first was Robbie Love, presenting “FUCK in spoken British English revisited with the Spoken BNC2014”. By replicating the approaches of McEnery & Xiao (2004) on the new data contained in the Spoken BNC2014, Robbie found, among other things, that FUCK is now used equally by men and women, and that use of FUCK peaks when speakers are in their 20s and then decreases with age, apart from the 60-69 group which has a higher frequency than the 50-59 group.

Also discussing the BNC2014 project was Abi Hawtin, who presented “The British National Corpus Revisited: Developing parameters for Written BNC2014.” Abi discussed the progress on the project so far, and gave the audience a chance to look at the sampling frame which has been designed for the corpus. Abi also highlighted the difficulty of collecting certain text types, particularly published books.

Amelia Joulain-Jay presented “Describing collocation patterns in OCR data: are MI and LL reliable?” Amelia discussed the fact that data which has been digitized using OCR procedures often has low levels of accuracy, and how this can affect corpus analysis. Amelia tested the reliability of Mutual Information statistics and Log Likelihood statistics when working with OCR data, and found that, among other things, Mutual Information and Log Likelihood attract high rates of false positives. However, she also found that correcting OCR data using Overproof makes a positive difference for both statistics.

CASS director, Andrew Hardie, also presented research using OCR data. He gave a talk titled “Plotting and comparing corpus lexical growth curves as an assessment of OCR quality in historical news data”. Andrew further drew our attention to the amount of errors, or ‘noise’, in OCR data, and showed that if a graph is constructed of number of tokens observed versus count of types at intervals (say, every 10,000 tokens) a curve characteristic of lexical growth over the span of a given corpus emerges. Andrew showed that visual comparison of lexical growth curves among historical collections, or to modern corpora, therefore generates a good impression of the relative extent of OCR noise, and thus some estimate of how much such noise will impede analysis.

Also presenting was Dana Gablasova who discussed “A corpus-based approach to the expression of subjectivity in L2 spoken English: The case of ‘I + verb’ construction”. Dana used the Trinity Lancaster Corpus (TLC) to investigate the ‘I + verb’ construction in L1 Spanish and Italian speakers aged over 20 years. Dana found that with the increase in proficiency the frequency of emotive verbs decreased while the frequency of the epistemic verbs increased considerably. The study also identified the most frequent cognitive and emotive verbs and the trends in their use according to the proficiency level of L2 users.

Vaclav Brezina (and Matt Timperley, who was unfortunately not able to attend the conference) gave a software demonstration of #LancsBox – a new-generation corpus analysis tool developed at CASS. Vaclav showed that #LancsBox can:

  • Search, sort and filter examples of language use.
  • Compare frequency of words and phrases in multiple corpora and subcorpora.
  • Identify and visualise meaning associations in language (collocations).
  • Compute and visualize keywords.
  • Use a simple but powerful interface.
  • Support a number of advanced features such as customisable statistical measures.

#LancsBox can be downloaded for free from the tool website http://corpora.lancs.ac.uk/lancsbox.

Dana and Vaclav also gave a presentation together, titled “MI-score-based collocations in language learning research: A critical evaluation.” Dana and Vaclav identified several issues in the use of MI-score as a measure in language learning research, and used data from the BNC and TLC to:

  • place the MI-score in the context of other similar association measures and discuss the similarities and differences directly relevant to LLR
  • to propose general principles for selection of association measures in LLR.

Finally, former CASS senior research associate Laura Paterson, who recently moved to a lectureship at the Open University, presented “Visualising corpora using Geographical Text Analysis (GTA): (Un)employment in the UK, a case study”, which stemmed from her work on the CASS Distressed Communities project. Laura showed how GTA can be used to generate maps from concordance lines. She showed lots of interesting data visualisations and highlighted the way in which GTA allows the researcher to visualise their corpus and adds a consideration of physical space to language analysis.

Aside from all of the fascinating talks, ICAME38 also had a brilliant social programme. We were able to go on 2 boat trips along the river. The first gave us brilliant views of the city, and the second allowed us to get much closer to the bridges and buildings which line the river. The Gala dinner was also great fun – we had a linguistics themed menu and, best of all an Abba tribute band!

Thank you to all of the organisers of ICAME38 for such an enjoyable and well-organised conference!

 

Data-driven learning: learning from assessment

The process of converting valuable spoken corpus data into classroom materials is not necessarily straightforward, as a recent project conducted by Trinity College London reveals.

One of the buzz words we increasingly hear from teacher trainers in English Language Teaching (ELT) is the use of data-driven learning. This ties in with other contemporary pedagogies, such as discovery learning.  A key component of this is how data from a corpus can be used to inform learning. One of our long-running projects with the Trinity Lancaster Corpus has been to see how we could use the spoken data in the classroom so that students could learn from assessment as well as for assessment. We have reported before (From Corpus to Classroom 1 and From Corpus to Classroom 2) on the research focus on pragmatic and strategic examples. These linguistic features and competences are often not practised – or are only superficially addressed – in course books and yet can be significant in enhancing learners’ communication skills, especially across cultures. Our ambition is to translate the data findings for classroom use, specifically to help teachers improve learners’ wider speaking competences.

We developed a process of constructing sample worksheets based on, and including, the corpus data. The data was contextualized and presented to teachers in order to give them an opportunity to use their expertise in guiding how this data could be developed for, and utilized in, the classroom. So, essentially, we asked teachers to collaborate on checking how useful the data and tasks were and potentially improving these tasks. We also asked teachers to develop their own tasks based on the data and we now have the results of this project.

Overwhelmingly, the teachers were very appreciative of the data and they each produced some great tasks. All of these were very useful for the classroom but they did not really exploit the unique information we identified as being captured in the data. We have started exploring why this might be the case.

What the teachers did was the following:

  • Created noticing and learner autonomy activities with the data (though most tasks would need much more scaffolding).
  • Focused on traditional information about phrases identified in the data, e.g. the strength and weakness of expressions of agreement.
  • Created activities that reflected traditional course book approaches.
  • Created reflective, contextual practice related to the data although this sometimes became lost in the addition of extra non-corpus texts.

We had expectations that the data would inspire activities which:

  • showed new ways of approaching the data
  • supported discovery learning tasks with meaningful outcomes
  • explored the context and pragmatic functions of the data
  • reflected pragmatic usage; perhaps even referring to L1 as a resource for this
  • focused on the listener and interpersonal aspects rather than just the speaker

It was clear that the teachers were intellectually engaged and excited, so we considered the reasons why their tasks had taken a more traditional path than expected. Many of these have been raised in the past by Tim Johns and Simon Borg. There is no doubt that the heavy teacher workload affects how far teachers feel they can be innovative with materials. There is a surety in doing what you know and what you know works. Also many teachers, despite being in the classroom everyday, often need a certain confidence to design input when this is traditionally something that has been left to syllabus and course book creators. Another issue was that we realised that teachers would probably have to have more support in understanding corpus data and many don’t have the time to do extra training. Finally, there may be the issue with this particular data that teachers may not be fully aware of the importance of pragmatic and strategic competences. Often they are seen as an ‘add-on’ rather than a core competence especially in contexts for contemporary communications when it is largely being used as a lingua franca.

Ultimately, there was a difference between what the researchers ‘saw’ and what the teachers ‘saw’. As an alternative, we asked a group of expert material writers to produce new tasks and they have produced some innovative material. We concluded that maybe this is a fairer approach. In other words, instead of expecting each of the roles involved in language teaching (SLA researchers, teachers, materials designers) to find the time to become experts in new skills, it may sometimes be better to use each other as a resource. This would still be a learning experience as we draw on each other’s expertise.

In future if we want teachers to collaborate on designing materials we must make sure we discuss the philosophy or pedagogy behind our objectives (Rapti, 2013) with our collaborators, that we show how the data is mapped to relevant curricula and that we recognise the restrictions caused by practical issues such as a lack of time or training opportunities.

The series of worksheets is now available from the Trinity College London website. More to come in the future so keep checking.