Tag: cqpweb

April 28 2014

Blogs, News, Research

Log Ratio – an informal introduction

In the latest version of CQPweb (v 3.1.7) a new statistic for keywords, collocations and lockwords is introduced, called Log Ratio. “Log Ratio” is actually my own made-up abbreviated title for something which is more precisely defined as either the binary log of the ratio of relative frequencies or the binary log of the relative
Continue Reading
February 3 2014

Blogs, News

Using version control software for corpus construction

There are two problems that often come up in collaborative efforts towards corpus construction. First, how do two or more people pool their efforts simultaneously on this kind of work – sharing the data as it develops without working at cross-purposes, repeating effort, or ending up with incompatible versions of the corpus? Second, how do
Continue Reading
September 20 2013

Blogs, News, Research

A new version of EEBO on CQPweb

The version of the EEBO-TCP data that has been available on Lancaster University’s CQPweb server is now rather old (the TCP project adds text to the collection on a rolling basis), and, more importantly, does not contain any annotations. Recently I have devoted some time to running a newer version through UCREL’s standard annotation tools and then mounting the resulting dataset
Continue Reading
August 21 2013

Blogs, News, Research

Visiting With The Brown Family

In 2011 I gave a plenary talk on how American English is changing over time (contrasting it with British English), using the Brown Family of corpora. Each member of the Brown family consists of a corpus of 1 million words of written, published, standard English, divided into 500 files each of about 2000 words each.
Continue Reading