2014/15 in retrospective: Perspectives on Chinese

Looking back over the academic year as it draws to a close, one of the highlights for us here at CASS was the one-day seminar we hosted in January on Perspectives on Chinese: Talks in Honour of Richard Xiao. This event celebrated the contributions to linguistics of CASS co-investigator Dr. Richard Zhonghua Xiao, on the occasion of both his retirement in October 2014 (and simultaneous taking-up of an honorary position with the University!), and the completion of the two funded research projects which Richard has led under the aegis of CASS.

The speakers included present and former collaborators with Richard – some (including myself) from here at Lancaster, others from around the world – as well as other eminent scholars working in the areas that Richard has made his own: Chinese corpus linguistics (especially, but not only, comparative work), and the allied area of the methodologies that Richard’s work has both utilised and promulgated.

In the first presentation, Prof. Hongyin Tao of UCLA took a classic observation of corpus-based studies – the existence, and frequent occurrence, of highly predictable strings or structures, pointed out a little-noticed aspect of these highly-predictable elements. They often involve lacunae, or null elements, where some key component of the meaning is simply left unstated and assumed. An example of this is the English expression under the influence, where “the influence of what?” is often implicit, but understood to be drugs/alcohol. It was pointed out that collocation patterns may identify the null elements, but that a simplistic application of collocation analysis may fail to yield useful results for expressions containing null elements. Finally, an extension of the analysis to yinxiang, the Chinese equivalent of influence, showed much the same tendencies – including, crucially, the importance of null elements – at work.

The following presentation came from Prof. Gu Yueguo of the Chinese Academy of Social Sciences. Gu is well-known in the field of corpus linguistics for his many projects over the years to develop not just new corpora, but also new types of corpus resources – for example, his exciting development in recent years of novel types of ontology. His presentation at the seminar was very much in this tradition, arguing for a novel type of multimodal corpus for use in the study of child language acquisition.

At this point in proceedings, I was deeply honoured to give my own presentation. One of Richard’s recently-concluded projects involved the application of Douglas Biber’s method of Multidimensional Analysis to translational English as the “Third Code”. In my talk, I presented methodological work which, together with Xianyao Hu, I have recently undertaken to assist this kind of analysis by embedding tools for the MD approach in CQPweb. A shorter version of this talk was subsequently presented at the ICAME conference in Trier at the end of May.

Prof. Xu Hai of Guangdong University of Foreign Studies gave a presentation on the study of the study of Learner Chinese, an issue which was prominent among Richard’s concerns as director of the Lancaster University Confucius Institute. As noted above, Richard has led a project funded by the British Academy, looking at the acquisition of Mandarin Chinese as a foreign language; as a partner on that project, Xu’s presentation of a preliminary report on the Guangwai Lancaster Chinese Learner Corpus was timely indeed. This new learner corpus – already in excess of a million words in size, and consisting of a roughly 60-40 split between written and spoken materials – follows the tradition of the best learner corpora for English by sampling learners with many different national backgrounds, but also, interestingly, includes some longitudinal data. Once complete, the value of this resource for the study of L2 Chinese interlanguage will be incalculable.

The next presentation was another from colleagues of Richard here at Lancaster: Dr. Paul Rayson and Dr. Scott Piao gave a talk on the extension of the UCREL Semantic Analysis System (USAS) to Chinese. This has been accomplished by means of mapping the vast semantic lexicon originally created for English across to Chinese, initially by automatic matching, and secondarily by manual editing. Scott and Paul, with other colleagues including CASS’s Carmen Dayrell, went on to present this work – along with work on other languages – at the prestigious NAACL HLT 2015 conference, in whose proceedings a write-up has been published.

Prof. Jiajin Xu (Beijing Foreign Studies University) then made a presentation on corpus construction for Chinese. This area has, of, course, been a major locus of activity by Richard over the years: his Lancaster Corpus of Mandarin Chinese (LCMC), a Mandarin match for the Brown corpus family, is one of the best openly-available linguistic resources for that language, and his ZJU Corpus of Translational Chinese (ZCTC) was a key contribution of his research on translation in Chinese . Xu’s talk presented a range of current work building on that foundation, especially the ToRCH (“Texts of Recent Chinese”) family of corpora – a planned Brown-family-style diachronic sequence of snapshot corpora in Chinese from BFSU, starting with the ToRCH2009 edition. Xu rounded out the talk with some case studies of applications for ToRCH, looking first at recent lexical change in Chinese by comparing ToRCH2009 and LCMC, and then at features of translated language in Chinese by comparing ToRCH2009 and ZCTC.

The last presentation of the day was from Dr. Vittorio Tantucci, who has recently completed his PhD at the department of Linguistics and English Language at Lancaster, and who specialises in a number of issues in cognitive linguistic analysis including intersubjectivity and evidentiality. His talk addressed specifically the Mandarin evidential marker 过 guo, and the path it took from a verb meaning ‘to get through, to pass by’ to becoming a verbal grammatical element. He argued that this exemplified a path for an evidential marker to originate from a traversative structure – a phenomenon not noted on the literature on this kind of grammaticalisation, which focuses on two other paths of development, from verbal constructions conveying a result or a completion. Vittorio’s work is extremely valuable, not only in its own right but as a demonstration of the role that corpus-based analysis, and cross-linguistic evidence, has to play on linguistic theory. Given Richard’s own work on the grammar and semantics of aspect in Chinese, a celebration of Richard’s career would not have been complete without an illustration of how this trend in current linguistics continues to develop.

All in all, the event was a magnificent tribute to Richard and his highly productive research career, and a potent reminder of how diverse his contributions to the field have actually been, and of their far-reaching impact among practitioners of Chinese corpus linguistics. The large and lively audience certainly seemed to agree with our assessment!

Our deep thanks go out to all the invited speakers, especially those who travelled long distances to attend – our speaker roster stretched from California in the west, to China in the east.

Introducing the Corpus of Translational English (COTE)

We are pleased to announce that CASS has recently compiled another new corpus, the Corpus of Translational English (COTE). The construction of COTE is supported by the joint ESRC (UK) – RGC (Hong Kong) research project, “Comparable and Parallel Corpus Approaches to the Third Code: English and Chinese Perspectives” (ES/K010107/1). The project is led by Dr Richard Xiao and Dr Andrew Hardie at CASS in collaboration with Dr Dechao Li and Professor Chu-Ren Huang of the Hong Kong Polytechnic University.

COTE is a one-million-word balanced comparable corpus of translated English texts, which is designed as a translational counterpart of the Freiburg–LOB Corpus of British English (F-LOB). The new corpus is intended to match F-LOB as closely as possible in size and composition, but is supposed to represent translational English published in the 1990s. Like the F-LOB corpus, COTE comprises five hundred text samples of around 2,000 words each, which are distributed across 15 text categories. The corpus is created with the explicit aim of providing a reliable empirical basis for identifying the typical common features of translated English texts and investigating variations in such features across different types of text on the basis of quantitative analyses of the balanced corpus of translational English in contrast with comparable corpora of native English.

Like many balanced native English corpora such as F-LOB, COTE includes metadata information such as text type and date of publication as well as linguistic annotation such as part-of-speech tagging. But as a translational English corpus, COTE additionally includes various translation-specific metadata, e.g. the source language, translator, date and source of publication in the header of each text sample, which makes it possible to categorize the texts to suit different research purposes. The corpus is currently restricted for in-house use by the project team. It will be released and made accessible online when the project is completed.

Related outputs:

Hu, X.  (2014) Does the Style of Translation Exist? A corpus-based Multidimensional Analysis of the stylistic features of the translated Chinese. Paper presented at the 2nd Second Asia Pacific Corpus Linguistics Conference. 7 – 9 March, the Hong Kong Polytechnic University.

Hu, X. & Xiao, R. (2014). How different is English translation from native writings of English? A multi-feature statistical model for linguistic variation analysis. Paper presented at the 35th ICAME conference. 30 April to 4 May, the University of Nottingham.

Hu, X. & Xiao, R. (2014). What role do Source Languages play in the variation of translational English? A corpus-based survey of Source Language interference. Paper presented at the 7th IVACS conference, 19-21 June 2014, Newcastle University.

Xiao, R. & Hu, X.  (2014). General tendencies and variations of translational English across registers. Paper presented at the 4th UCCTS conference, 24-26 July 2014, Lancaster University.

McEnery, A. & Xiao, R. (2014). The development of corpus linguistics in English and Chinese contexts. In Ishikawa, S. (ed.) Learner Corpus studies in Asia and the World: Papers from LCSAW2014, Vol. 2, pp. 7-45. Kobe, Japan: Kobe University.

Hu, X., Xiao, R. & Hardie, A. (under preparation). How do English translations differ from native English writings? A multi-feature statistical model for linguistic variation analysis.

Translation and contrastive linguistic studies at the interface of English and Chinese

A forthcoming special issue of Corpus Linguistics and Linguistics Theory, which is guest-edited by Dr Richard Xiao and Professor Naixing Wei, President of the Corpus Linguistics Society of China, is now available online as Ahead of Print at the journal website.

This special issue focuses on corpus-based translation and contrastive linguistic studies involving two genetically different languages, namely English and Chinese, which we believe have formed an important interface with its unique features as a result of the mutual interaction between the two languages.

Corpora have tremendously benefited translation and contrastive studies, and in the meantime, corpus-based translation and contrastive linguistic studies have also significantly expanded the scope of corpus linguistic research. While contrastive linguistics and translation studies have traditionally been accepted as two separate disciplines within applied linguistics, there are many contact points between the two; and with the common corpus-based approach and the usually shared type of data (e.g. comparable and parallel corpora), corpus-based translation and contrastive linguistic studies have become even more closely interconnected, as demonstrated by the articles included in this special issue.

This special issue of Corpus Linguistics and Linguistics Theory includes five research articles together with an extensive introduction written by the guest editors.

These studies combine contrastive analysis and translation studies on the basis of comparable corpora (either multilingual or monolingual) and parallel corpora of English and Chinese, two most widely spoken world languages that differ genetically. While the decision to involve English and Chinese in the research reported in this volume was largely based on the authors’ strong languages (they are all competently bilingual in Chinese and English), the significance of the typological distance between the two languages covered in these studies cannot be underestimated. In comparison with studies of typologically related languages, translation and cross-linguistic studies of genetically distant languages such as English and Chinese can have more important theoretical implications for linguistic theorization. Studying such language pairs help us gain a better appreciation of the scale of variability in the human language system while theories and observations based on closely related language pairs can give rise to conclusions which seem certain but which, when studied in the context of a language pair such as English and Chinese, become not merely problematized afresh, but significantly more challenging to resolve (cf. Xiao and McEnery 2010).

Studies reported on in this special issue embody features at the interface of English and Chinese, which can be expected to have important significance and practical implications for linguistic theorizing.

Acquisition of Mandarin Chinese as a foreign language

The British Academy has awarded Lancaster University a three-year grant under its International Partnership and Mobility Scheme (IPM 2013). The research partner in the joint project is Guangdong University of Foreign Studies (GDUFS) in China. The project is entitled “The corpus-based approach to the acquisition of Mandarin Chinese as a foreign language”, which aims to develop a one-million-word balanced corpus of spoken and written Chinese interlanguage, and on the basis of this corpus, to explore various theoretical and practical issues pertaining to the acquisition of Chinese as a foreign language. The research team includes six staff members from the Linguistics department and the Confucius Institute at Lancaster, as well as six staff members from the Centre for Linguistics and Applied Linguistics (the only national key research centre of its kind approved by the Ministry of Education) and the Institute for International Education at GDUFS. For more information, please contact Dr Richard Xiao (r.xiao(Replace this parenthesis with the @ sign)lancaster.ac.uk), the PI of the project.