  • Log Ratio – an informal introduction

    In the latest version of CQPweb (v 3.1.7) a new statistic for keywords, collocations and lockwords is introduced, called Log Ratio. “Log Ratio” is actually my own made-up abbreviated title for something which is more precisely defined as either the binary log of the ratio of relative frequencies or the binary log of the relative…

  • Using version control software for corpus construction

    There are two problems that often come up in collaborative efforts towards corpus construction. First, how do two or more people pool their efforts simultaneously on this kind of work – sharing the data as it develops without working at cross-purposes, repeating effort, or ending up with incompatible versions of the corpus? Second, how do…

  • A new version of EEBO on CQPweb

    The version of the EEBO-TCP data that has been available on Lancaster University’s CQPweb server is now rather old (the TCP project adds text to the collection on a rolling basis), and, more importantly, does not contain any annotations. Recently I have devoted some time to running a newer version through UCREL’s standard annotation tools and then mounting the resulting dataset…

  • Visiting With The Brown Family

    In 2011 I gave a plenary talk on how American English is changing over time (contrasting it with British English), using the Brown Family of corpora. Each member of the Brown family consists of a corpus of 1 million words of written, published, standard English, divided into 500 files each of about 2000 words each.…

CASS: Briefings is a series of short, quick reads on the work being done at the ESRC/CASS research centre at Lancaster University, UK.