Introducing the Corpus of Translational English (COTE)

We are pleased to announce that CASS has recently compiled another new corpus, the Corpus of Translational English (COTE). The construction of COTE is supported by the joint ESRC (UK) – RGC (Hong Kong) research project, “Comparable and Parallel Corpus Approaches to the Third Code: English and Chinese Perspectives” (ES/K010107/1). The project is led by Dr Richard Xiao and Dr Andrew Hardie at CASS in collaboration with Dr Dechao Li and Professor Chu-Ren Huang of the Hong Kong Polytechnic University.

COTE is a one-million-word balanced comparable corpus of translated English texts, which is designed as a translational counterpart of the Freiburg–LOB Corpus of British English (F-LOB). The new corpus is intended to match F-LOB as closely as possible in size and composition, but is supposed to represent translational English published in the 1990s. Like the F-LOB corpus, COTE comprises five hundred text samples of around 2,000 words each, which are distributed across 15 text categories. The corpus is created with the explicit aim of providing a reliable empirical basis for identifying the typical common features of translated English texts and investigating variations in such features across different types of text on the basis of quantitative analyses of the balanced corpus of translational English in contrast with comparable corpora of native English.

Like many balanced native English corpora such as F-LOB, COTE includes metadata information such as text type and date of publication as well as linguistic annotation such as part-of-speech tagging. But as a translational English corpus, COTE additionally includes various translation-specific metadata, e.g. the source language, translator, date and source of publication in the header of each text sample, which makes it possible to categorize the texts to suit different research purposes. The corpus is currently restricted for in-house use by the project team. It will be released and made accessible online when the project is completed.

Related outputs:

Hu, X.  (2014) Does the Style of Translation Exist? A corpus-based Multidimensional Analysis of the stylistic features of the translated Chinese. Paper presented at the 2nd Second Asia Pacific Corpus Linguistics Conference. 7 – 9 March, the Hong Kong Polytechnic University.

Hu, X. & Xiao, R. (2014). How different is English translation from native writings of English? A multi-feature statistical model for linguistic variation analysis. Paper presented at the 35th ICAME conference. 30 April to 4 May, the University of Nottingham.

Hu, X. & Xiao, R. (2014). What role do Source Languages play in the variation of translational English? A corpus-based survey of Source Language interference. Paper presented at the 7th IVACS conference, 19-21 June 2014, Newcastle University.

Xiao, R. & Hu, X.  (2014). General tendencies and variations of translational English across registers. Paper presented at the 4th UCCTS conference, 24-26 July 2014, Lancaster University.

McEnery, A. & Xiao, R. (2014). The development of corpus linguistics in English and Chinese contexts. In Ishikawa, S. (ed.) Learner Corpus studies in Asia and the World: Papers from LCSAW2014, Vol. 2, pp. 7-45. Kobe, Japan: Kobe University.

Hu, X., Xiao, R. & Hardie, A. (under preparation). How do English translations differ from native English writings? A multi-feature statistical model for linguistic variation analysis.