A new research associate has joined CASS

A new research associate, Dr Xianyao Hu, has recently been appointed in the Department of Linguistics and English Language to work on the CASS project “Comparable and Parallel Corpus Approaches to the Third Code”, led by Dr Richard Xiao the PI of the project.

Dr Hu is Professor of Translation Studies from Southwest University in China. He was awarded his PhD in Translation Studies in 2006 at East China Normal University, specializing in Corpus-based Translation Studies. Since 2004, Dr Hu has published a range of research articles and a book on corpus-based empirical studies of Translational Norms and Universals. In 2008-2010, he worked as a postdoctoral research fellow at Beijing Foreign Studies University on a project based on a sizable bidirectional parallel corpus between Chinese and English. He taught as a teaching assistant on the MA Programme of English-Chinese Translating and Interpreting at University of Salford in 2005-2006, and spent a year as a Fulbright Visiting Scholar working in a corpus project at University of California Los Angeles in 2011-2012.

Dr Hu joined CASS on 1st November 2013, the official start date of the project. He will work with Dr Richard Xiao on the UK component of the international project, which is collaboratively undertaken by Lancaster University and the Hong Kong Polytechnic University and jointly funded by ESRC in the UK and the Research Grant Council in Hong Kong.

Visiting With The Brown Family

In 2011 I gave a plenary talk on how American English is changing over time (contrasting it with British English), using the Brown Family of corpora. Each member of the Brown family consists of a corpus of 1 million words of written, published, standard English, divided into 500 files each of about 2000 words each. Fifteen genres of writing are represented – this framework being created decades ago when the original Brown corpus was compiled by Henry Kučera and W. Nelson Francis at Brown University, having the distinction of being the first publically available corpus ever built. Containing only American texts published in 1961, it originally went by the name of A Standard Corpus of Present-Day Edited American English for use with Digital Computers but later became known as just the Brown Corpus. It was followed by an equivalent British version, with later members representing English from the 1990s, the 2000s and the 1930s. A 1901 British version is in the pipeline.

Before I gave my talk, however, Mark Davies gave a brilliant presentation on the COHA (Corpus of Historical American English) which has 400 million words and covers the period from 1800 to the present day. It was the proverbial hard act to follow. Compared to the COHA, the Brown family are tiny, and the coverage occurs across 30 or 15 year snapshots, rather than representing every year. If we identify, say, that the word Mr is less frequent in 2006 than in 1991 then it is tempting to say that Mr is becoming less frequent over time. But we don’t know for certain what corpora from all the years in between would tell us. Having multiple sampling points presents a more convincing picture, but judicious hedging must be applied.

Also, being small, many words in the Brown family have tiny frequencies so it’s very difficult to make any claims about them. And the sampling could be viewed as rather outdated – the sorts of texts that people accessed in the 1960s are not necessarily the same as they access now. There are no online texts in the Brown family (although to ease collection, both the 2006 members involved texts that were originally published in written form, then placed online). Nor is there any advertising text. Or song lyrics. Or horror fiction. Or erotica (although there is a section on Romantic Fiction which could be pushed in that direction). Finally, the fact that all the texts are of the published variety means that they tend to represent a somewhat standardised, conservative form of English. A lot of the innovation in English happens in much more informal contexts, especially where young people or people from different backgrounds mix together – inner-city playgrounds and internet forums being two good examples. By the time such innovation gets into written published standard English, it’s no longer innovative. So the Brown family can’t tell us about the cutting edge of language use – they’ll always be a few years out of fashion.

So what are the Brown family good for, if anything?

Continue reading