Newby Fellow appointed to CASS

The Department of Linguistics and English Language has recently appointed a Newby Fellow, Dr. Helen Baker, to work on the CASS project entitled ‘Newspapers, Poverty and Long-Term Change. A Corpus Analysis of Five Centuries of Texts’.

Dr. Baker is a social historian who was awarded her Ph.D. in Russian History at the University of Leeds in 2002. Her thesis examined popular reactions to the Khodynka disaster, a stampede which took place during the coronation celebrations of Nicholas II in 1896. She taught Russian and European history at the University of Bradford before working as a teaching assistant in the Department of Russian and Slavonic Studies at the University of Leeds between 2003-2007.

Helen Baker has previously worked as a transcriber and historical researcher for the Department of Linguistics and Language, completing a historical chronology of the Scottish Glencairn Uprising of 1653 for the British Academy funded ‘Newsbooks at Lancaster’ project. This research sparked an interest in early modern history and she went on to investigate the lives of seventeenth-century English prostitutes. Her first book, co-authored with CASS Centre Director, Professor Tony McEnery, is forthcoming and uses the study of early-modern prostitution as a case study to illustrate that historians and corpus linguists have much to gain through academic collaboration.

The project ‘Newspapers, Poverty and Long-Term Change’, which is funded by the Newby Trust, aims to assemble the largest ever corpora of newspapers and related material from 1473 to 1900 and use this to investigate changing discourses on poverty across this period. Dr. Baker will officially join the project on 1 July 2014, working with Professor Tony McEnery, Dr. Andrew Hardie, and Professor Ian Gregory.

The appointment will mean something of a home-coming for Helen Baker, who studied for her undergraduate degree in the History Department at Lancaster University between 1994-1997.

New CASS: Briefing now available — Opposing gay rights in UK Parliament: Then and now

CASSbriefings-gayrightsOpposing gay rights in UK Parliament: Then and now. How has the expression of opposition to gay rights changed in Parliamentary speeches in recent years? How are discussions of gay people involved in these changes? To what extent could these arguments be seen as homophobic? Read this CASS: Briefing of a diachronic corpus-based discourse analysis to find out more.


New resources are being added regularly to the new CASS: Briefings tab above, so check back soon.

Visiting With The Brown Family

In 2011 I gave a plenary talk on how American English is changing over time (contrasting it with British English), using the Brown Family of corpora. Each member of the Brown family consists of a corpus of 1 million words of written, published, standard English, divided into 500 files each of about 2000 words each. Fifteen genres of writing are represented – this framework being created decades ago when the original Brown corpus was compiled by Henry Kučera and W. Nelson Francis at Brown University, having the distinction of being the first publically available corpus ever built. Containing only American texts published in 1961, it originally went by the name of A Standard Corpus of Present-Day Edited American English for use with Digital Computers but later became known as just the Brown Corpus. It was followed by an equivalent British version, with later members representing English from the 1990s, the 2000s and the 1930s. A 1901 British version is in the pipeline.

Before I gave my talk, however, Mark Davies gave a brilliant presentation on the COHA (Corpus of Historical American English) which has 400 million words and covers the period from 1800 to the present day. It was the proverbial hard act to follow. Compared to the COHA, the Brown family are tiny, and the coverage occurs across 30 or 15 year snapshots, rather than representing every year. If we identify, say, that the word Mr is less frequent in 2006 than in 1991 then it is tempting to say that Mr is becoming less frequent over time. But we don’t know for certain what corpora from all the years in between would tell us. Having multiple sampling points presents a more convincing picture, but judicious hedging must be applied.

Also, being small, many words in the Brown family have tiny frequencies so it’s very difficult to make any claims about them. And the sampling could be viewed as rather outdated – the sorts of texts that people accessed in the 1960s are not necessarily the same as they access now. There are no online texts in the Brown family (although to ease collection, both the 2006 members involved texts that were originally published in written form, then placed online). Nor is there any advertising text. Or song lyrics. Or horror fiction. Or erotica (although there is a section on Romantic Fiction which could be pushed in that direction). Finally, the fact that all the texts are of the published variety means that they tend to represent a somewhat standardised, conservative form of English. A lot of the innovation in English happens in much more informal contexts, especially where young people or people from different backgrounds mix together – inner-city playgrounds and internet forums being two good examples. By the time such innovation gets into written published standard English, it’s no longer innovative. So the Brown family can’t tell us about the cutting edge of language use – they’ll always be a few years out of fashion.

So what are the Brown family good for, if anything?

Continue reading