Workshop on Corpus Linguistics in Ghana

Back in 2014, a team from CASS ran a well-received introductory workshop on Corpus Linguistics in Accra, Ghana – a country where Lancaster University has a number of longstanding academic partnerships and has recently established a campus.

We’re pleased to announce that in February of this year, we will be returning to Ghana and running two more introductory one-day events. Both events are free to attend, each consisting of a series of introductory lectures and practical sessions on topics in corpus linguistics and the use of corpus tools.

Since the 2014 workshop was attended by some participants from a long way away, this time we are running events in two different locations in Ghana. The first workshop, on Tuesday 23rd February 2016, will be in Cape Coast, organised jointly with the University of Cape Coast: click this link for details. The second workshop, on  Friday 26th February 2016, will be in Legon (nr. Accra), organised jointly with the University of Ghana: click this link for details. The same material will be covered at both workshops.

The workshop in 2014 was built largely around the use of our online corpus tools, particularly CQPweb. In the 2016 events, we’re going to focus instead on a pair of programs that you can run on your own computer to analyse your own data: AntConc and GraphColl. For that reason we will be encouraging participants who have their own corpora to bring them along to analyse in the workshop. These can be in any language – not just English! Don’t worry however – we will also provide sample datasets that participants who don’t have their own data can work with.

We invite anyone in Ghana who wants to learn more about the versatile methodology for language analysis that is corpus linguistics to attend! While the events are free, registration in advance is required, as places are limited.

Sino-UK Corpus Linguistics Summer School

ShanghaiAt the end of July, Tony McEnery and I taught at the second Sino-UK corpus linguistics summer school, arranged between CASS and Shanghai Jiao Tong University. It was my first time visiting China and we arrived during an especially warm season with temperatures hitting 40 degrees Celsius (we were grateful for the air conditioning in the room we taught in).

Tony opened the summer school, giving an introductory session on corpus linguistics, followed a few days later by a session on collocations, where he introduced CASS’s new tool for collocational networks, GraphColl. I gave a session on frequency and keywords, followed by later sessions on corpus linguistics and language teaching, and CL and discourse analysis. For the lab work components of our sessions, we didn’t use a computer lab. Instead the students brought along their own laptop and tablets, including a few who carried out BNCweb searches on their mobile phones! I was impressed by how much the students attending already knew, and had to think on my feet a couple of times – particularly when asked to explain some of the more arcane aspects of WordSmith (such as the “Standardised Type Token ratio standard deviation”).

At the end of the summer school, a symposium was held where Tony gave a talk on his work with Dana Gablasova and Vaclav Brezina on the Trinity Learner Language corpus. I talked about some research I’m currently doing with Amanda Potts on change and variation in British and American English.

Also presenting were Prof Gu Yuego (Beijing Foreign Studies University) who talked about building a corpus of texts on Chinese medicine, and Prof. Roger K Moore (University of Sheffield) who discussed adaptive speech recognition in noisy contexts.

We were made to feel very welcome by our host, Gavin Zhen, one of the lecturers at the university, who went out of his way to shuttle us on the 90 minute journey from the university to our hotel on the Bund.

It was a great event and it was nice to see students getting to grips with corpus linguistics so enthusiastically.

2014/15 in retrospective: Perspectives on Chinese

Looking back over the academic year as it draws to a close, one of the highlights for us here at CASS was the one-day seminar we hosted in January on Perspectives on Chinese: Talks in Honour of Richard Xiao. This event celebrated the contributions to linguistics of CASS co-investigator Dr. Richard Zhonghua Xiao, on the occasion of both his retirement in October 2014 (and simultaneous taking-up of an honorary position with the University!), and the completion of the two funded research projects which Richard has led under the aegis of CASS.

The speakers included present and former collaborators with Richard – some (including myself) from here at Lancaster, others from around the world – as well as other eminent scholars working in the areas that Richard has made his own: Chinese corpus linguistics (especially, but not only, comparative work), and the allied area of the methodologies that Richard’s work has both utilised and promulgated.

In the first presentation, Prof. Hongyin Tao of UCLA took a classic observation of corpus-based studies – the existence, and frequent occurrence, of highly predictable strings or structures, pointed out a little-noticed aspect of these highly-predictable elements. They often involve lacunae, or null elements, where some key component of the meaning is simply left unstated and assumed. An example of this is the English expression under the influence, where “the influence of what?” is often implicit, but understood to be drugs/alcohol. It was pointed out that collocation patterns may identify the null elements, but that a simplistic application of collocation analysis may fail to yield useful results for expressions containing null elements. Finally, an extension of the analysis to yinxiang, the Chinese equivalent of influence, showed much the same tendencies – including, crucially, the importance of null elements – at work.

The following presentation came from Prof. Gu Yueguo of the Chinese Academy of Social Sciences. Gu is well-known in the field of corpus linguistics for his many projects over the years to develop not just new corpora, but also new types of corpus resources – for example, his exciting development in recent years of novel types of ontology. His presentation at the seminar was very much in this tradition, arguing for a novel type of multimodal corpus for use in the study of child language acquisition.

At this point in proceedings, I was deeply honoured to give my own presentation. One of Richard’s recently-concluded projects involved the application of Douglas Biber’s method of Multidimensional Analysis to translational English as the “Third Code”. In my talk, I presented methodological work which, together with Xianyao Hu, I have recently undertaken to assist this kind of analysis by embedding tools for the MD approach in CQPweb. A shorter version of this talk was subsequently presented at the ICAME conference in Trier at the end of May.

Prof. Xu Hai of Guangdong University of Foreign Studies gave a presentation on the study of the study of Learner Chinese, an issue which was prominent among Richard’s concerns as director of the Lancaster University Confucius Institute. As noted above, Richard has led a project funded by the British Academy, looking at the acquisition of Mandarin Chinese as a foreign language; as a partner on that project, Xu’s presentation of a preliminary report on the Guangwai Lancaster Chinese Learner Corpus was timely indeed. This new learner corpus – already in excess of a million words in size, and consisting of a roughly 60-40 split between written and spoken materials – follows the tradition of the best learner corpora for English by sampling learners with many different national backgrounds, but also, interestingly, includes some longitudinal data. Once complete, the value of this resource for the study of L2 Chinese interlanguage will be incalculable.

The next presentation was another from colleagues of Richard here at Lancaster: Dr. Paul Rayson and Dr. Scott Piao gave a talk on the extension of the UCREL Semantic Analysis System (USAS) to Chinese. This has been accomplished by means of mapping the vast semantic lexicon originally created for English across to Chinese, initially by automatic matching, and secondarily by manual editing. Scott and Paul, with other colleagues including CASS’s Carmen Dayrell, went on to present this work – along with work on other languages – at the prestigious NAACL HLT 2015 conference, in whose proceedings a write-up has been published.

Prof. Jiajin Xu (Beijing Foreign Studies University) then made a presentation on corpus construction for Chinese. This area has, of, course, been a major locus of activity by Richard over the years: his Lancaster Corpus of Mandarin Chinese (LCMC), a Mandarin match for the Brown corpus family, is one of the best openly-available linguistic resources for that language, and his ZJU Corpus of Translational Chinese (ZCTC) was a key contribution of his research on translation in Chinese . Xu’s talk presented a range of current work building on that foundation, especially the ToRCH (“Texts of Recent Chinese”) family of corpora – a planned Brown-family-style diachronic sequence of snapshot corpora in Chinese from BFSU, starting with the ToRCH2009 edition. Xu rounded out the talk with some case studies of applications for ToRCH, looking first at recent lexical change in Chinese by comparing ToRCH2009 and LCMC, and then at features of translated language in Chinese by comparing ToRCH2009 and ZCTC.

The last presentation of the day was from Dr. Vittorio Tantucci, who has recently completed his PhD at the department of Linguistics and English Language at Lancaster, and who specialises in a number of issues in cognitive linguistic analysis including intersubjectivity and evidentiality. His talk addressed specifically the Mandarin evidential marker 过 guo, and the path it took from a verb meaning ‘to get through, to pass by’ to becoming a verbal grammatical element. He argued that this exemplified a path for an evidential marker to originate from a traversative structure – a phenomenon not noted on the literature on this kind of grammaticalisation, which focuses on two other paths of development, from verbal constructions conveying a result or a completion. Vittorio’s work is extremely valuable, not only in its own right but as a demonstration of the role that corpus-based analysis, and cross-linguistic evidence, has to play on linguistic theory. Given Richard’s own work on the grammar and semantics of aspect in Chinese, a celebration of Richard’s career would not have been complete without an illustration of how this trend in current linguistics continues to develop.

All in all, the event was a magnificent tribute to Richard and his highly productive research career, and a potent reminder of how diverse his contributions to the field have actually been, and of their far-reaching impact among practitioners of Chinese corpus linguistics. The large and lively audience certainly seemed to agree with our assessment!

Our deep thanks go out to all the invited speakers, especially those who travelled long distances to attend – our speaker roster stretched from California in the west, to China in the east.

Welcoming the new members of the Climate Change team

We are delighted to announce that Dr. Marcus Müller from the University of Heidelberg (Germany) and Dr. Maria Cristina Caimotto from the University of Torino (Italy) have kindly agreed to join CASS Changing Climate project, led by Professor John Urry.

They both will have a lot to contribute to the project. Their experience and language skills will allow us to broaden the project’s scope and also examine the discourses around climate change issues in German and Italian newspapers.

Dr. Marcus Müller is a senior lecturer in German linguistics at the Department of German in the University of Heidelberg, Germany. He is also an associate member of the Heidelberg Centre for Transcultural Studies (HCTS) and a teaching fellow of the Heidelberg Graduate School for Humanities and Social Sciences (HGGS). He has also been a visiting lecturer at the universities of Paderborn and Düsseldorf as well as at the universities of Tashkent, Budapest and Beijing. Dr. Marcus Müller is the founder and spokesman for the German-Chinese graduate network “Sprachkulturen – Fachkulturen” and the “Language and Knowledge” Graduate Platform ( His research interests include corpus linguistics, discourse analysis, grammatical variation, language and social roles, language and art. You can find more about him at

Dr. Maria Cristina Caimotto is research fellow in English Language and Translation at the Department of Culture, Politics and Society of the University of Torino. She is also a member of the Environmental Humanities International Research Group. Her research interests include translation studies, political discourse and environmental discourse. In her work, the contrastive analysis of texts in different languages (translated or comparable) is employed as a tool for critical discourse analysis.

Reflections from the CASS student challenge panel member, part 3

Pamela Irwin, this year’s CASS student challenge panel member, is looking back on her past year of research. This is part 3 of her reflections — need to catch up on the others? Click here to read part 1, or here to read part 2

Lately, I have been examining sociolinguistics and its related sub-disciplines as part of my exploration of the synergy between the social sciences (sociology/social gerontology) and language (corpus linguistics) in relation to my research.

My first task was to compare sociolinguistics with the sociology of language. According to the literature, in brief, the focus of sociolinguistics is to ascertain the effect of society on language, whereas the sociology of language is oriented around the influence of language on society.

Even with this conceptual clarification, I still found it quite difficult to assimilate the vertical (layers) and horizontal (scope) dimensions of sociolinguistics and then to differentiate within and between the sociolinguistic sub-specialities. At this stage, it was a relief to discover that some of these social/linguistic links had already been mapped, including sociolinguistics and corpus linguistics (Baker, 2010), critical discourse analysis and corpus linguistics (Baker, Gabrielatos, Knosravinik, Krzyzanowski, McEnery & Wodak, 2008), realism and corpus linguistics (Sealey, 2010) and linguistics and ethnography (Rampton, Maybin & Tusting, 2007).

Linguistic ethnography has particular relevance my study’s ethnographic methodology. During my ethnographic fieldwork in rural Australia, I obtained data from multiple sources: historical records, contemporary materials such as local newspapers and community notices, participant interviews and journals, and field notes. As I had naively assumed that all types of data are equally valid, Creese’s (2011) advocacy of a non-hierarchial balance between researcher fieldnotes and interactional data (interviews, conversations) was reassuring.

According to Rampton (2007), a distinctive linguistic ethnography is still evolving and as such, it remains open to wider interpretative approaches. Here, Sealey’s (2007) juxtaposition of linguistic ethnography and realism to address ‘what kinds of language in what circumstances and with what outcome?” (p. 641) makes a valuable contribution to my analytical repertoire. For instance, my findings suggest that the older and late middle-aged women’s life history narratives vary significantly in terms of their depth (reflective/instrumental) and breadth (expansive/constrained). While these differences do not seem to be related to the type of data (written versus spoken accounts), the influence of temporal (age, period, cohort) and situational (rural/urban, ‘local’/newcomer) circumstances on the women’s accounts is less clear. Corpus linguistics provides an objective analytical method of unravelling these complex inter-relationships.


Baker, P. (2010). Sociolinguistics and corpus linguistics. Edinburgh: Edinburgh University Press.

Baker, P., Gabrielatos, C., Khosravinik, M., McEnery, T., & Wodak, R. (2008). A useful synergy? Combining critical discourse analysis and corpus linguistics to examine discourses of refugees and asylum seekers in the UK press. Discourse & Society, 19(3), 273-306. doi: 10.1177/0957926508088962

Creese, A. (2011). Making local practices globally relevant in researching multilingual education. In F.M. Hult and K.A. King (Eds.). Educational linguistics in practice: Applying the local globally and the global locally. Chapter 3. pp. 41-59 Bristol, UK: Multilingual Matters.

Rampton, B. (2007). Neo-Hymesian linguistic ethnography in the United Kingdom. Journal of Sociolinguistics, 11(5), 584-607. doi: 10.1111/j.1467-9841.2007.00341.x

Sealey, A. (2007). Linguistic ethnography in realist perspective. Journal of Sociolinguistics, 11(5), 641-660. doi: 10.1111/j.1467-9841.2007.00341.x

Sealey, A. (2010). Probabilities and surprises: A realist approach to identifying linguistic and social patterns, with reference to an oral history corpus. Applied linguistics, 31(2), 215-235. doi: 10.1093/applin/amp023

Are you interested in becoming the next student challenge panel member? Apply to attend our free summer school to learn more.

Translation and contrastive linguistic studies at the interface of English and Chinese

A forthcoming special issue of Corpus Linguistics and Linguistics Theory, which is guest-edited by Dr Richard Xiao and Professor Naixing Wei, President of the Corpus Linguistics Society of China, is now available online as Ahead of Print at the journal website.

This special issue focuses on corpus-based translation and contrastive linguistic studies involving two genetically different languages, namely English and Chinese, which we believe have formed an important interface with its unique features as a result of the mutual interaction between the two languages.

Corpora have tremendously benefited translation and contrastive studies, and in the meantime, corpus-based translation and contrastive linguistic studies have also significantly expanded the scope of corpus linguistic research. While contrastive linguistics and translation studies have traditionally been accepted as two separate disciplines within applied linguistics, there are many contact points between the two; and with the common corpus-based approach and the usually shared type of data (e.g. comparable and parallel corpora), corpus-based translation and contrastive linguistic studies have become even more closely interconnected, as demonstrated by the articles included in this special issue.

This special issue of Corpus Linguistics and Linguistics Theory includes five research articles together with an extensive introduction written by the guest editors.

These studies combine contrastive analysis and translation studies on the basis of comparable corpora (either multilingual or monolingual) and parallel corpora of English and Chinese, two most widely spoken world languages that differ genetically. While the decision to involve English and Chinese in the research reported in this volume was largely based on the authors’ strong languages (they are all competently bilingual in Chinese and English), the significance of the typological distance between the two languages covered in these studies cannot be underestimated. In comparison with studies of typologically related languages, translation and cross-linguistic studies of genetically distant languages such as English and Chinese can have more important theoretical implications for linguistic theorization. Studying such language pairs help us gain a better appreciation of the scale of variability in the human language system while theories and observations based on closely related language pairs can give rise to conclusions which seem certain but which, when studied in the context of a language pair such as English and Chinese, become not merely problematized afresh, but significantly more challenging to resolve (cf. Xiao and McEnery 2010).

Studies reported on in this special issue embody features at the interface of English and Chinese, which can be expected to have important significance and practical implications for linguistic theorizing.