CASS go to ICAME38!

Researchers from CASS recently attended the ICAME38 conference at Charles University in Prague. Luckily, we arrived in Prague a day early which gave us plenty of time to explore the city. The weather was sunny, so we walked to Wenceslas Square, and then took the lift to the top of the Old Town Hall Tower to enjoy the views over the city.

The following day, it was time to begin the conference! Over the course of the event, seven CASS members presented their research (you can view full abstracts of all talks here). Up first was Robbie Love, presenting “FUCK in spoken British English revisited with the Spoken BNC2014”. By replicating the approaches of McEnery & Xiao (2004) on the new data contained in the Spoken BNC2014, Robbie found, among other things, that FUCK is now used equally by men and women, and that use of FUCK peaks when speakers are in their 20s and then decreases with age, apart from the 60-69 group which has a higher frequency than the 50-59 group.

Also discussing the BNC2014 project was Abi Hawtin, who presented “The British National Corpus Revisited: Developing parameters for Written BNC2014.” Abi discussed the progress on the project so far, and gave the audience a chance to look at the sampling frame which has been designed for the corpus. Abi also highlighted the difficulty of collecting certain text types, particularly published books.

Amelia Joulain-Jay presented “Describing collocation patterns in OCR data: are MI and LL reliable?” Amelia discussed the fact that data which has been digitized using OCR procedures often has low levels of accuracy, and how this can affect corpus analysis. Amelia tested the reliability of Mutual Information statistics and Log Likelihood statistics when working with OCR data, and found that, among other things, Mutual Information and Log Likelihood attract high rates of false positives. However, she also found that correcting OCR data using Overproof makes a positive difference for both statistics.

CASS director, Andrew Hardie, also presented research using OCR data. He gave a talk titled “Plotting and comparing corpus lexical growth curves as an assessment of OCR quality in historical news data”. Andrew further drew our attention to the amount of errors, or ‘noise’, in OCR data, and showed that if a graph is constructed of number of tokens observed versus count of types at intervals (say, every 10,000 tokens) a curve characteristic of lexical growth over the span of a given corpus emerges. Andrew showed that visual comparison of lexical growth curves among historical collections, or to modern corpora, therefore generates a good impression of the relative extent of OCR noise, and thus some estimate of how much such noise will impede analysis.

Also presenting was Dana Gablasova who discussed “A corpus-based approach to the expression of subjectivity in L2 spoken English: The case of ‘I + verb’ construction”. Dana used the Trinity Lancaster Corpus (TLC) to investigate the ‘I + verb’ construction in L1 Spanish and Italian speakers aged over 20 years. Dana found that with the increase in proficiency the frequency of emotive verbs decreased while the frequency of the epistemic verbs increased considerably. The study also identified the most frequent cognitive and emotive verbs and the trends in their use according to the proficiency level of L2 users.

Vaclav Brezina (and Matt Timperley, who was unfortunately not able to attend the conference) gave a software demonstration of #LancsBox – a new-generation corpus analysis tool developed at CASS. Vaclav showed that #LancsBox can:

  • Search, sort and filter examples of language use.
  • Compare frequency of words and phrases in multiple corpora and subcorpora.
  • Identify and visualise meaning associations in language (collocations).
  • Compute and visualize keywords.
  • Use a simple but powerful interface.
  • Support a number of advanced features such as customisable statistical measures.

#LancsBox can be downloaded for free from the tool website http://corpora.lancs.ac.uk/lancsbox.

Dana and Vaclav also gave a presentation together, titled “MI-score-based collocations in language learning research: A critical evaluation.” Dana and Vaclav identified several issues in the use of MI-score as a measure in language learning research, and used data from the BNC and TLC to:

  • place the MI-score in the context of other similar association measures and discuss the similarities and differences directly relevant to LLR
  • to propose general principles for selection of association measures in LLR.

Finally, former CASS senior research associate Laura Paterson, who recently moved to a lectureship at the Open University, presented “Visualising corpora using Geographical Text Analysis (GTA): (Un)employment in the UK, a case study”, which stemmed from her work on the CASS Distressed Communities project. Laura showed how GTA can be used to generate maps from concordance lines. She showed lots of interesting data visualisations and highlighted the way in which GTA allows the researcher to visualise their corpus and adds a consideration of physical space to language analysis.

Aside from all of the fascinating talks, ICAME38 also had a brilliant social programme. We were able to go on 2 boat trips along the river. The first gave us brilliant views of the city, and the second allowed us to get much closer to the bridges and buildings which line the river. The Gala dinner was also great fun – we had a linguistics themed menu and, best of all an Abba tribute band!

Thank you to all of the organisers of ICAME38 for such an enjoyable and well-organised conference!

 

Registration now open for Lancaster Summer Schools in Corpus Linguistics and other Digital Methods!

Registration now open for Lancaster Summer Schools in Corpus Linguistics and other Digital Methods!

We are pleased to announce that we will be running our hugely popular summer schools again in 2017! We will be running six free training events that cover the techniques of corpus linguistics, computational analysis of language and geographical information systems. The schools include both lectures and practical sessions that introduce the latest developments in the field and practical applications of cutting-edge analytical techniques. The summer schools are taught by leading experts in the field both from CASS and other departments and institutions (CASS Challenge Panel).

The summer schools running in 2017 are:

  • Corpus linguistics for Language studies
  • Corpus linguistics for Social Science
  • Corpus linguistics for the Humanities
  • Statistics for Corpus linguistics
  • Geographical information systems for the Digital Humanities
  • Corpus-based NLP

The summer schools will take place over 4 days (27th – 30th June 2017) and are free to attend. Click here for more information and to register.

Dealing with Optical Character Recognition errors in Victorian newspapers

CASS PhD student, Amelia Joulain-Jay, has been researching to what extent OCR errors are a problem when researching historical texts, and whether these errors can be corrected. Amelia’s work has recently been featured in a very interesting blog post on the British Library’s website – you can read the full post here.

 

Tracking terrorists who leave a technological trail.

Dr Sheryl Prentice’s work on using technology to aid in the detection of terrorists has been gaining a lot of attention in the media this week! Sheryl’s discussion of the different ways in which technology can be used to tackle the issue of terrorism and how effective these methods are was originally published in The Conversation, and then republished by the ‘i’ newspaper on 23rd June 2016. You can read the original article here.

Introducing Yufang Qian to CASS

CASS is delighted to welcome visiting researcher Yufang Qian to the centre, where she will be working on a project exploring the representation of Chinese medicine in British historical news texts over the last 200 years. Continue reading to find out more about Yufang and the research which she will be undertaking!


Yufang

In 2009, Yufang Qian obtained her PhD at Lancaster University with a dissertation on corpus-based discourse studies, under the supervision of Professors Tony McEnery and Paul Baker. She then returned to Zhejiang University of Media and Communications (ZUMC) and was appointed Professor in 2011.

Yufang is committed to popularizing the combination of corpus and discourse approaches in China. She has taught corpus linguistics and media discourse at the ZUMC to students at all levels and supervised more than 50 students’ dissertations relating to corpus-based discourse studies, in disciplines as diverse as communication studies, education, sociology, psychology and politics. The students have then either persued their further educations in the UK, USA, Japan, South Korea, and Hong Kong in this area or have used the expertise they have gained in various institutions and organizations in China.

In 2010 Yufang published ‘Corpus and Critical Discourse Analysis’ in the journal Foreign Language Teaching and Research, the first paper to introduce corpus-based discourse analysis in a Chinese journal. To date, it has been cited 48 times and downloaded 3515 times. In the past few years she has published nearly two dozens journal articles on corpus-based media discourse analysis. Her PhD thesis, Discursive constructions around terrorism in the People’s Daily and The Sun before and after 9.11 (Oxford: Peter Lang 2010), won the third Prize in the Sixth Outstanding Achievement Awards for Research in Humanities and Social Sciences, conferred by the Ministry of Education in 2013, the top governmental award in social science in China.

To explain and promote the application of the corpus-based discourse approach, Yufang has spoken at many national and international conferences and has given lectures at more than a dozen universities in China. She is Founding Director of Research Center for Discourse and Communications at the ZUMC, the first of its kind in China. She is principle investigator for many research projects, such as ‘Discursive constructions around the low carbon economy in the press of China, the UK and the US’, funded by the Ministry of Education; and ‘A corpus- based comparative study of Western and Chinese political discourse analysis’, funded by the National Social Science Foundation. She is also co-principle investigator of the project entitled ‘A comparative study of the discourse system in Chinese dream films’, funded by the National Social Science Foundation.

Yufang’s comparative perspective is evident from her early paper, ‘Contrasting signals of politeness between Western and Eastern countries’, published in Education in China (ed. E Fizette; Fenton, MI: Hana Guild, 1993). Since 2014, she has been working with CCPN Global (China in Comparative Perspective Network Global, an affiliate member of the Academy of Social Sciences, UK) to develop a project entitled ‘Corpus approaches for Chinese social science (CACSS)’. She is organizing a panel on ‘Corpus approaches to governance in the context of climate change’ at the 3rd Global China Dialogue on 2 December 2016 at the British Academy.

Yufang has recently returned to her alma mater, Lancaster University, as a visiting researcher, where she will work with Professor McEnery on a project exploring the representation of Chinese medicine in British historical news texts over the last 200 years. This diachronic observation of discourse on Chinese medicine is significant in that it will provide specific evidence of the media’s role in public health vis-à-vis the use of traditional Chinese medicine in the West. It is hoped that the findings of this study will help bridge the gap between Western and Chinese medicine, both of which play a role in serving public health.

FireAnt is making headlines!

FireAnt, a tool for extracting, visualising and exporting social media data, is making headlines! The tool, developed by Claire Hardaker and Laurence Anthony at CASS, has been noted by the Daily Mail for it’s abilities to “hunt down terrorists and trolls”. We’re delighted that FireAnt is being recognised for its capabilities in social media data analysis, and that this is being illustrated to the public in mainstream news.

You can read the article here.

You can read more about FireAnt and it’s development here and here.

News: Professor John Urry

CASS is extremely sorry to hear of the death of Professor John Urry. We have lost a very distinguished and enthusiastic member of our team, and he will be greatly missed by all at the centre. You can read more about John’s life and work here.

CASS PhD Student Awarded Gale Dissertation Research Fellowship!

CASS is delighted that the Gale Dissertation Research Fellowship for research to be undertaken in 2016 has been awarded to Amelia Joulain-Jay, a PhD student at CASS, for her work on using Geographical Information Systems and Corpus Linguistics methods to investigate how places were represented in nineteenth-century British newspapers.

The Gale Dissertation Research Fellowship is awarded byThe Research Society for Victorian Periodicals (RSVP) in support of dissertation research that makes substantial use of full-text digitized collections of 19th-century British magazines and newspapers. The Fellowship aims to support historical and literary research that deepens our understanding of the 19th-century British press in all its rich variety, and also encourages the scholarly use of collections of full-text digital facsimiles of these primary sources in aid of that research.  A prize of $1500 will be awarded, together with one year’s passworded subscription to selected digital collections from Gale, including 19th Century UK Periodicals and 19th Century British Library Newspapers.

Congratulations to CASS ‘s Professor Ram-Prasad

Congratulations to CASS ‘s Professor Ram-Prasad who has been announced as the winner of the ‘Best Book in Hindu-Christian Studies (2011-2015)’ book ‘Divine Self, Human Self: The Philosophy of Being in Two Gita Commentaries’ (Bloomsbury, 2013).  The Society for Hindu-Christian Studies will hold a panel discussion of Professor Ram-Prasad’s book at the November 2016 annual meeting in San Antonio.

Welcome to our newest senior research associate – Gavin Brookes!

CASS just keeps getting fuller! Gavin Brookes is the newest senior research associate to join the centre, and will be working on our “Beyond the checkbox – understanding what patients say in feedback on NHS services” project. Here’s a little about Gavin, in his own words:

received_10153870011246093I am very excited to begin my role as Senior Research Associate working with Professor Paul Baker on the CASS project, “Beyond the checkbox – understanding what patients say in feedback on NHS services”. The purpose of this research is to help the National Health Service better understand patient feedback with a view to improving frontline healthcare service provision (you can find more info. here: http://cass.lancs.ac.uk/?p=1832). This project is corpus linguistics at its most applied. Its aims are timely and have clear and significant practical consequences and I am thrilled to be a part of it!

I am endlessly fascinated by the relationship between discourse and social life and have adopted corpus linguistic, (critical) discourse analytical and multimodal approaches to investigate this relationship in my research to-date. My enthusiasm for this project will come as little surprise when I tell you that I am particularly interested in how discourse shapes and represents our experiences and understandings of health and illness. My ESRC-funded doctoral research, undertaken in the School of English Studies at The University of Nottingham, examines the discursive construction of a contested condition known as diabulimia in a specialised corpus of online health messages.

Outside academia I spend my time walking, travelling, reading fantasy and science fiction novels, partaking in pub quizzes, and following my beloved (if perpetually under-achieving) Mansfield Town FC. I am delighted to be here and can’t wait to learn more about, and get involved in, the research that is being undertaken within the Department.