Corpus linguistics MOOC: Second run beginning soon

We are running the corpus MOOC again – and we are really looking forward to it. In the first run of the course we taught social scientists and other researchers from across the globe about how to use corpus linguistics to study language. We looked at a range of topics of contemporary social relevance in doing so – including how we talk about disability and how newspapers write about refugees. We also looked at key areas where corpus linguistics has contributed greatly, notably the areas of dictionary construction and language teaching.

The result, I must say, exceeded our expectations – which were pretty high. People really seemed to like the course and get a lot from it. Even though the approach was entirely new to most students, a very large number worked through all eight weeks of the course. The feedback on our training has been exceptionally strong – a look at the #corpusMOOC hashtag on Twitter will give a good idea of the overwhelmingly positive response to that course. The following quote, from a Chinese notice board on which our MOOC was discussed, gives a strong sense of how the course succeeded both in training students and in showing them that corpora have a key role to play in exploring social science questions (thanks to Richard Xiao for the translation):

“CorpusMOOC, with its assembly of the best corpus linguists and rich content, cannot be praised enough … The greatest benefit for me has been that the course has widened my vision: corpus linguistics and the applications of corpus technologies have gone far beyond what I had imagined – more resembling big data in the field of social science research instead of being confined to linguistics… I think the significance of this course lies not merely in teaching a large number of corpus techniques but more, rather, in introducing corpora and demonstrating what corpora can be used for, thus making us aware of them and helping us understand their importance … the corpus-based approach is the unavoidable approach to language in future.”

The first run of the MOOC had a great impact – the course was taken mainly by women (70.44% of students), and drew participants from all continents and a wide range of countries – including places as far flung as the British Antarctic Territory! The areas in which course participants were working and researching were heavily oriented to the social sciences, with students drawn from areas such as business consulting and management, health and social care and media and publishing. The greatest contribution of the course, however, seems to have come from providing training to teachers/lecturers in the UK and beyond. Given that the great majority of students were taking the course for career development (78.59%), the course was likely not only to have had a strong effect on this group but also, by extension, on the students who are exposed to the ideas in the course by the teachers/lecturers who took it.

Having read this, you can probably understand why we were keen to run the course again. Through it we have been able to get a good understanding of corpus linguistics across to thousands of people around the globe. We have made a few changes to the course based on the feedback we received – all designed to make a good course better! This includes new lectures (for example on the language used in cancer treatment) and new in conversation pieces with corpus linguists (such as Douglas Biber).

If this run of the course proves as popular as the first, which we think it should, we plan to run the course every September. Who knows when we will stop!

For a limited time, registration is still open. Book your place on ‘Corpus linguistics: method, analysis, interpretation’ now. 

Introducing CASS Challenge Panel Member: Douglas Biber

This week, we are proud to announce Douglas Biber’s membership on the CASS Challenge Panel. Find his brief autobiographical introduction below.


douglasbiberI have been interested in lots of different research issues over my career, but for the most part, these have all involved the analysis of linguistic variation in natural texts, and the description of the ways in which linguistic features vary across registers.  Surprisingly, I began to develop this emphasis even before I got interested in corpus-based approaches.

So, for example, I recently re-read an article that I wrote in 1984, on focus markers in Central Somali, and I was surprised to be reminded that I was already developing this research focus.   In that article, I argued that focus markers in Somali oral narratives are used for important discourse functions — signaling aspects of textual organization and prominence — rather than simply distinguishing between given versus new information.  But the surprising thing to me was that I was already developing the view that register/genre differences are centrally important to the description of language use:  “…pragmatic roles should be studied in a broad range of discourse genres…since each genre may illustrate different functions of the same constructions” (Biber 1984:1).

Shortly afterwards, I started working with computerized corpora, and I have found corpora to be extremely useful as collections of natural texts representing different spoken and written registers. As a result, basically all the research that I’ve carried out since the 1980s has been based on the analysis of corpora.

Beyond my overall interest in register variation, I have used corpus analysis to explore patterns of variation at many different specific linguistic levels, including collocational patterns, phraseological patterns (lexical bundles and lexical frames), lexico-grammatical patterns, grammatical features (with a particular recent interest in grammatical complexity), and discourse units.  I have been interested in the description of register variation from both synchronic and diachronic perspectives, and in describing patterns of register variation in English as well as other languages.  My most influential works have probably been my 1988 Cambridge book Variation across Speech and Writing, where I develop the ‘multi-dimensional’ approach to the analysis of register variation, and the 1999 co-authored Longman Grammar of Spoken and Written English.  But I’ve authored or co-authored a dozen other books over the years, on topics ranging from linguistic variation among university registers, to corpus approaches to the analysis of discourse organization, to cross-linguistic patterns of register variation.  At present, I’m actively working on several projects, including a book describing the historical development of grammatical complexity features in written registers, a major NSF-sponsored grant project to describe the patterns of linguistic variation among Web registers, and an ETS-sponsored grant project to describe the patterns of linguistic variation among university student written registers across disciplines.


Visit the CASS Challenge Panel page to read about our other members.