2014/15 in retrospective: Perspectives on Chinese

Looking back over the academic year as it draws to a close, one of the highlights for us here at CASS was the one-day seminar we hosted in January on Perspectives on Chinese: Talks in Honour of Richard Xiao. This event celebrated the contributions to linguistics of CASS co-investigator Dr. Richard Zhonghua Xiao, on the occasion of both his retirement in October 2014 (and simultaneous taking-up of an honorary position with the University!), and the completion of the two funded research projects which Richard has led under the aegis of CASS.

The speakers included present and former collaborators with Richard – some (including myself) from here at Lancaster, others from around the world – as well as other eminent scholars working in the areas that Richard has made his own: Chinese corpus linguistics (especially, but not only, comparative work), and the allied area of the methodologies that Richard’s work has both utilised and promulgated.

In the first presentation, Prof. Hongyin Tao of UCLA took a classic observation of corpus-based studies – the existence, and frequent occurrence, of highly predictable strings or structures, pointed out a little-noticed aspect of these highly-predictable elements. They often involve lacunae, or null elements, where some key component of the meaning is simply left unstated and assumed. An example of this is the English expression under the influence, where “the influence of what?” is often implicit, but understood to be drugs/alcohol. It was pointed out that collocation patterns may identify the null elements, but that a simplistic application of collocation analysis may fail to yield useful results for expressions containing null elements. Finally, an extension of the analysis to yinxiang, the Chinese equivalent of influence, showed much the same tendencies – including, crucially, the importance of null elements – at work.

The following presentation came from Prof. Gu Yueguo of the Chinese Academy of Social Sciences. Gu is well-known in the field of corpus linguistics for his many projects over the years to develop not just new corpora, but also new types of corpus resources – for example, his exciting development in recent years of novel types of ontology. His presentation at the seminar was very much in this tradition, arguing for a novel type of multimodal corpus for use in the study of child language acquisition.

At this point in proceedings, I was deeply honoured to give my own presentation. One of Richard’s recently-concluded projects involved the application of Douglas Biber’s method of Multidimensional Analysis to translational English as the “Third Code”. In my talk, I presented methodological work which, together with Xianyao Hu, I have recently undertaken to assist this kind of analysis by embedding tools for the MD approach in CQPweb. A shorter version of this talk was subsequently presented at the ICAME conference in Trier at the end of May.

Prof. Xu Hai of Guangdong University of Foreign Studies gave a presentation on the study of the study of Learner Chinese, an issue which was prominent among Richard’s concerns as director of the Lancaster University Confucius Institute. As noted above, Richard has led a project funded by the British Academy, looking at the acquisition of Mandarin Chinese as a foreign language; as a partner on that project, Xu’s presentation of a preliminary report on the Guangwai Lancaster Chinese Learner Corpus was timely indeed. This new learner corpus – already in excess of a million words in size, and consisting of a roughly 60-40 split between written and spoken materials – follows the tradition of the best learner corpora for English by sampling learners with many different national backgrounds, but also, interestingly, includes some longitudinal data. Once complete, the value of this resource for the study of L2 Chinese interlanguage will be incalculable.

The next presentation was another from colleagues of Richard here at Lancaster: Dr. Paul Rayson and Dr. Scott Piao gave a talk on the extension of the UCREL Semantic Analysis System (USAS) to Chinese. This has been accomplished by means of mapping the vast semantic lexicon originally created for English across to Chinese, initially by automatic matching, and secondarily by manual editing. Scott and Paul, with other colleagues including CASS’s Carmen Dayrell, went on to present this work – along with work on other languages – at the prestigious NAACL HLT 2015 conference, in whose proceedings a write-up has been published.

Prof. Jiajin Xu (Beijing Foreign Studies University) then made a presentation on corpus construction for Chinese. This area has, of, course, been a major locus of activity by Richard over the years: his Lancaster Corpus of Mandarin Chinese (LCMC), a Mandarin match for the Brown corpus family, is one of the best openly-available linguistic resources for that language, and his ZJU Corpus of Translational Chinese (ZCTC) was a key contribution of his research on translation in Chinese . Xu’s talk presented a range of current work building on that foundation, especially the ToRCH (“Texts of Recent Chinese”) family of corpora – a planned Brown-family-style diachronic sequence of snapshot corpora in Chinese from BFSU, starting with the ToRCH2009 edition. Xu rounded out the talk with some case studies of applications for ToRCH, looking first at recent lexical change in Chinese by comparing ToRCH2009 and LCMC, and then at features of translated language in Chinese by comparing ToRCH2009 and ZCTC.

The last presentation of the day was from Dr. Vittorio Tantucci, who has recently completed his PhD at the department of Linguistics and English Language at Lancaster, and who specialises in a number of issues in cognitive linguistic analysis including intersubjectivity and evidentiality. His talk addressed specifically the Mandarin evidential marker 过 guo, and the path it took from a verb meaning ‘to get through, to pass by’ to becoming a verbal grammatical element. He argued that this exemplified a path for an evidential marker to originate from a traversative structure – a phenomenon not noted on the literature on this kind of grammaticalisation, which focuses on two other paths of development, from verbal constructions conveying a result or a completion. Vittorio’s work is extremely valuable, not only in its own right but as a demonstration of the role that corpus-based analysis, and cross-linguistic evidence, has to play on linguistic theory. Given Richard’s own work on the grammar and semantics of aspect in Chinese, a celebration of Richard’s career would not have been complete without an illustration of how this trend in current linguistics continues to develop.

All in all, the event was a magnificent tribute to Richard and his highly productive research career, and a potent reminder of how diverse his contributions to the field have actually been, and of their far-reaching impact among practitioners of Chinese corpus linguistics. The large and lively audience certainly seemed to agree with our assessment!

Our deep thanks go out to all the invited speakers, especially those who travelled long distances to attend – our speaker roster stretched from California in the west, to China in the east.

Introducing the Corpus of Translational English (COTE)

We are pleased to announce that CASS has recently compiled another new corpus, the Corpus of Translational English (COTE). The construction of COTE is supported by the joint ESRC (UK) – RGC (Hong Kong) research project, “Comparable and Parallel Corpus Approaches to the Third Code: English and Chinese Perspectives” (ES/K010107/1). The project is led by Dr Richard Xiao and Dr Andrew Hardie at CASS in collaboration with Dr Dechao Li and Professor Chu-Ren Huang of the Hong Kong Polytechnic University.

COTE is a one-million-word balanced comparable corpus of translated English texts, which is designed as a translational counterpart of the Freiburg–LOB Corpus of British English (F-LOB). The new corpus is intended to match F-LOB as closely as possible in size and composition, but is supposed to represent translational English published in the 1990s. Like the F-LOB corpus, COTE comprises five hundred text samples of around 2,000 words each, which are distributed across 15 text categories. The corpus is created with the explicit aim of providing a reliable empirical basis for identifying the typical common features of translated English texts and investigating variations in such features across different types of text on the basis of quantitative analyses of the balanced corpus of translational English in contrast with comparable corpora of native English.

Like many balanced native English corpora such as F-LOB, COTE includes metadata information such as text type and date of publication as well as linguistic annotation such as part-of-speech tagging. But as a translational English corpus, COTE additionally includes various translation-specific metadata, e.g. the source language, translator, date and source of publication in the header of each text sample, which makes it possible to categorize the texts to suit different research purposes. The corpus is currently restricted for in-house use by the project team. It will be released and made accessible online when the project is completed.

Related outputs:

Hu, X.  (2014) Does the Style of Translation Exist? A corpus-based Multidimensional Analysis of the stylistic features of the translated Chinese. Paper presented at the 2nd Second Asia Pacific Corpus Linguistics Conference. 7 – 9 March, the Hong Kong Polytechnic University.

Hu, X. & Xiao, R. (2014). How different is English translation from native writings of English? A multi-feature statistical model for linguistic variation analysis. Paper presented at the 35th ICAME conference. 30 April to 4 May, the University of Nottingham.

Hu, X. & Xiao, R. (2014). What role do Source Languages play in the variation of translational English? A corpus-based survey of Source Language interference. Paper presented at the 7th IVACS conference, 19-21 June 2014, Newcastle University.

Xiao, R. & Hu, X.  (2014). General tendencies and variations of translational English across registers. Paper presented at the 4th UCCTS conference, 24-26 July 2014, Lancaster University.

McEnery, A. & Xiao, R. (2014). The development of corpus linguistics in English and Chinese contexts. In Ishikawa, S. (ed.) Learner Corpus studies in Asia and the World: Papers from LCSAW2014, Vol. 2, pp. 7-45. Kobe, Japan: Kobe University.

Hu, X., Xiao, R. & Hardie, A. (under preparation). How do English translations differ from native English writings? A multi-feature statistical model for linguistic variation analysis.

Newby Fellow appointed to CASS

The Department of Linguistics and English Language has recently appointed a Newby Fellow, Dr. Helen Baker, to work on the CASS project entitled ‘Newspapers, Poverty and Long-Term Change. A Corpus Analysis of Five Centuries of Texts’.

Dr. Baker is a social historian who was awarded her Ph.D. in Russian History at the University of Leeds in 2002. Her thesis examined popular reactions to the Khodynka disaster, a stampede which took place during the coronation celebrations of Nicholas II in 1896. She taught Russian and European history at the University of Bradford before working as a teaching assistant in the Department of Russian and Slavonic Studies at the University of Leeds between 2003-2007.

Helen Baker has previously worked as a transcriber and historical researcher for the Department of Linguistics and Language, completing a historical chronology of the Scottish Glencairn Uprising of 1653 for the British Academy funded ‘Newsbooks at Lancaster’ project. This research sparked an interest in early modern history and she went on to investigate the lives of seventeenth-century English prostitutes. Her first book, co-authored with CASS Centre Director, Professor Tony McEnery, is forthcoming and uses the study of early-modern prostitution as a case study to illustrate that historians and corpus linguists have much to gain through academic collaboration.

The project ‘Newspapers, Poverty and Long-Term Change’, which is funded by the Newby Trust, aims to assemble the largest ever corpora of newspapers and related material from 1473 to 1900 and use this to investigate changing discourses on poverty across this period. Dr. Baker will officially join the project on 1 July 2014, working with Professor Tony McEnery, Dr. Andrew Hardie, and Professor Ian Gregory.

The appointment will mean something of a home-coming for Helen Baker, who studied for her undergraduate degree in the History Department at Lancaster University between 1994-1997.

More about the Metaphor in End of Life Care project at Lancaster University

MELCcoverThe CASS-affiliated Metaphor in End of Life Care project has just released a free resource containing information of interest to many of our readers. Download the document now to learn more about the project, from basic concepts (what is metaphor, and how are they used in everyday life?) to more specific details (why study metaphor in end-of-life care?). Some interesting initial findings are also included. For instance, “Family carers often say that their emotions can only be safely ‘released’ when talking to people who are ‘in the same boat’.” Read on to learn more about the project.

Introducing CASS PhD student Amelia Joulain-Jay

joulainI am Amelia Joulain-Jay and I have just started some corpus-based doctoral research on the representation of places in nineteenth-century British newspapers. I grew up in Belgium, the daughter of an American mother and a French father, and this multi-lingual and multi-cultural environment fed my curiosity about the way people interact and communicate. After some post-secondary school volunteer work in India, Ecuador and Spain, I started studies in the Dalcroze pedagogy of music in Brussels before realizing I wanted to go to university.

My desire to unpick the relationship between language and society brought me to Lancaster University to study sociolinguistics and sociology. Once there, I discovered Corpus Linguistics and was impressed by its ability to handle volumes of textual evidence. The opportunity to further develop my skills in Corpus Linguistics by undertaking PhD research under the supervision of Andrew Hardie and Ian Gregory was too exciting to overlook, and I am delighted to be working surrounded by researchers in the CASS centre.

My research project is part of the ERC-funded Spatial Humanities project which aims to develop ways of analysing text using the affordances of Geographical Information Systems to benefit fields in the Humanities. The research for my PhD will be the first large-scale application of a method combining Corpus Linguistics and Geographical Information Systems to uncover spatial patterns in a large quantity of text. The material under study will come from the British Library’s recently digitized archive of nineteenth-century newspapers; hence the research is expected to make a valuable contribution to the field of nineteenth-century history.

Extra-academic facts about me you may find interesting: I am Baha’i; this year is the second year that I am one of the Lancaster University Music Society Choir’s conductors; my husband and I have recently set up a jazz band in which I sing.

CASS awarded £200,000 from landmark ESRC Urgency Grant Scheme

CASS is delighted to announce a successful ESRC application for funding on a project entitled “Twitter rape threats and the discourse of online misogyny” (ES/L008874/1). The award of £191,245.25 was one of the first (possibly even the first) to be made as part of the ESRC’s new Urgency Grants scheme. Under this scheme, applications are assessed very quickly, and projects also start within four weeks of a successful award. This particular project will begin in November and run for fourteen months. It will be part of the CASS Centre, and the team will be comprised of Claire Hardaker (PI), Tony McEnery (CI), Paul Baker (CI), Andrew Hardie (CI), Paul Iganski (CI), and two CASS-hosted research assistants.

This project will investigate the rape and death threats sent on Twitter in July and August 2013 to a number of high profile individuals, including MP Stella Creasy and journalist Caroline Criado-Perez. This project seeks to address the remarkable lack of research into such behaviour, especially in light of the fact that policymakers and legislators are under intense pressure to make quick, long-term decisions on relevant policy and procedure to allow enforcement agencies to act on this issue. Specifically, the project will investigate what the language used by those who send rape/death threats on Twitter reveals about…

  1. their concerns, interests, and ideologies; what concept do they seem to have of themselves and their role in society?
  2. their motivations and goals; what seems to trigger them? What do they seem to be seeking?
  3. the links between them and other individuals, topics, and behaviours; do they only produce misogynistic threats or do they engage in other hate-speech? Do they act alone or within networks?

The project will take a corpus approach, incorporating several innovative aspects, and it will produce results that should be relevant to several social sciences including sociology, criminology, politics, psychology, and law. It will also offer timely insight into an area where policy, practice, legislation, and enforcement is currently under intense scrutiny and requires such research to help shape future developments. As such, the results will likely be of interest to legislators, policymakers, investigative bodies, and law enforcement agencies, as well as the study participants, media, and general public.