The Spoken BNC2014 early access projects: Part 2

In January, we announced the recipients of the Spoken BNC2014 Early Access Data Grants. Over the next several months, they will use exclusive access to the first five million words of Spoken BNC2014 data to carry out a total of thirteen research projects.

In this series of blogs, we are excited to share more information about these projects, in the words of their authors.

In Part 2 of our series, read about the work of Chris Ryder et al., Andreea Calude and Barbara McGillivray et al.


Chris Ryder, Jacqueline Laws and Sylvia Jaworska

University of Reading, UK

From oldies to selfies: A diachronic corpus-based study into changing productivity patterns in British English suffixation

The data from the Spoken BNC2014 early access subset will provide a unique opportunity to examine changes that have occurred in affix use in spoken British English over a twenty-year period; for example, the word selfie has only entered general usage since the invention of the iPhone. Using the recently developed MorphoQuantics database containing complex word data for 222 word-final affixes from the demographically sampled subset of the original Spoken BNC, direct comparisons can be made between old and new datasets, focussing on suffixation patterns, changes in productivity, and trends that demonstrate the shifts in semantic scope of individual suffixes. These features will be analysed chiefly through an examination (both quantitative and qualitative) of neologisms within the data, specifically regarding their regularity of construction, occurrence, and meaning.

This study is just one example of the diachronic morphological analyses that will be made available through a comparison of the Spoken BNC2014 EAS and the Spoken BNC, by utilising the categorisation system provided by MorphoQuantics.


Andreea Calude

University of Waikato, New Zealand

Sociolinguistic variation in cleft constructions: a quantitative corpus study of spontaneous conversation

This project concerns links between the use of various grammatical constructions and sociolinguistic variation, for example is grammar used differently by men and women, or by younger and older speakers? We know that such variation can be observed for certain phonological features (e.g., some vowel sounds) and for certain pragmatic constructions (e.g., discourse markers and new and given information), but as regards grammar features, the answer remains largely unknown or at best vague.

I intend to use the Spoken BNC2014 early access subset to investigate cleft constructions from a sociolinguistic variationist perspective, with the aim of uncovering (potential) systematic syntactic variation across age, gender, dialect, and socio-economic status. Clefts constitute the most frequently used focusing strategy in English, with demonstrative clefts being among the most common in spontaneous conversation, for example: “That is what I want to study”, “This is where I was born”. Despite intense diachronic and synchronic study of the structure and function of clefts in English, virtually nothing is known about the relationship between clefts use and sociolinguistic variation.

The Spoken BNC2014 data will be coded for all demonstrative clefts using a combination of manual and automatic detection, and each construction identified will be attributed to a particular speaker profile (in terms of their sociolinguistic features). Three linguistic features will also be coded for each construction, namely discourse function, reference direction (cataphoric or anaphoric), and information structure (amount of new and given information included).  The data will be analysed using a mixed effects generalised linear regression model.


Barbara McGillivray1, Gard Buen Jenset1 and Michael Rundell2

1University of Oxford, UK

2Lexicography MasterClass, UK

The dative alternation revisited: fresh insights from contemporary spoken data

A well-known feature of English grammar is the dative alternation, whereby a verb may be used in an SVOO construction (Give me the money) or in the pattern SVO followed by a PP with the preposition to (Give the money to me). This is quite a well-researched topic, and generalizations have been made about the factors influencing a writer’s choice of one construction or another, and about which verbs show a preference for one of these patterns over the other. However, most of the studies published to date draw either on introspection or on data from written sources. The availability of contemporary, unscripted spoken data takes us into new territory, and offers an exciting opportunity to revisit this topic.

Our plan is to use the data from the Early Access Scheme to investigate verbs whose argument structure preferences include the dative alternation. Once we have all the relevant corpus data from the Spoken BNC2014 early access subset, we will analyse it using state-of-the-art multivariate statistical techniques, in order to account for the interplay of all the potentially significant variables, whether lexical, semantic, syntactic, or and social. The proposed study thus exploits many of the unique features of this dataset, including the metadata on speakers and the USAS semantic tagging, to answer questions concerning the possible influence of semantic categories, socio-economic factors, gender, dialect, age, as well as linguistic features on a speaker’s preferences. Once the study is complete, there would be opportunities for fresh comparative studies, either with the original Spoken BNC or with contemporary written data.


Check back soon for Part 3!

+ posts