My experience with the Corpus MOOC

Lancaster University’s MOOC in Corpus Linguistics has been hugely important to me during my doctoral research and I’ve taken it each year since it was first offered in 2014. This is not because I’m an especially slow learner or that I was unsuccessful in all of my previous attempts – it’s because the course has so much to offer that it’s impossible to appreciate all of the different aspects in one go; it requires repeated visits as understanding deepens and new questions emerge.

When I first took the course, I knew nothing about corpus linguistics (or MOOCs for that matter) and went through each week’s materials at a very introductory level, trying to get a handle on the principles and terminology and also learning about tools and techniques. At first, I was apprehensive, ready to bail at any sign of discomfort, but I found the lectures not only easy to follow, but also thoroughly enjoyable and endlessly fascinating. I was hooked! Although the course itself spanned eight weeks, the materials were available on the website long after the course was over. This allowed me to revisit and review tutorials whenever I felt unsure about something, and also start to focus on areas that aligned with my own research interests.

The following year, with the basics under my belt, I decided to take the course for a second time with the intention of tackling the content in more depth and also using my own data for the tutorials. What I found was that the multiple layers of the course became extremely valuable as I became more comfortable with different concepts and research in the field and also that my approach to the course had changed. Instead of following the course week by week as I had done the first time, I started to pick and choose different aspects that matched particular stages of my own research.

The third time I took the course, I was driven by an interest in the advanced materials as well as the discussions and comments by other students and mentors. I had so many questions arising from my own research that I felt it would be helpful to hear what others had to say about their own. The forum became an incredibly valuable resource and one that I had not appreciated as a beginner. It is extremely generous of Lancaster to offer such a fantastic course with all the support, resources, knowledge and materials and ask for nothing in return.

And now, even though I’ve completed my doctoral research, I’ve registered for the course for the fourth time. It is such an incredibly diverse and fascinating course, with so many layers and areas of interest, that there is still a great deal for me to learn. And the numerous scholars discussing their research have an enthusiasm and passion for their work that is both infectious and inspirational. Perhaps my husband is right, I’ve become addicted to this MOOC!

The next Corpus MOOC starts 25 September 2017. You can register for free at

The course is intended for anyone interested in quantitative language analysis – no prior knowledge of linguistics or corpora is required.

Would you like to share your experience of the Corpus MOOC? Include #CorpusMOOC in your tweets or other social media posts or get in touch via v.brezina(Replace this parenthesis with the @ sign)

User Involvement: CASS go to CLARIN PLUS workshop

At the beginning of June, I attended the CLARIN PLUS workshop on User Involvement held in the capital Helsinki. CLARIN stands for “Common Language Resources and Technology Infrastructure”; it is an international research infrastructure which provides scholars in the social sciences and humanities with easy access to digital language data, and also advanced tools to handle those data sets. The main purpose of the workshop was to share information, good practice, expertise, and ideas on how potential and current users can most benefit from CLARIN services.

I was representing Lancaster University as part of the UK branch of CLARIN, which is led by Martin Wynne at Oxford. Some of the participants, representing CLARIN’s different national consortia, shared their successful stories of their involvement with the local community.

At the workshop, Johanna Berg, from Sweden, and Mietta Lennes, from Finland showed us how they made innovative use of the roadshow event format to present some language resources across different institutions in their countries. Mietta also gave us a taste of the very useful tools and corpora that you can find at The Language Bank of Finland.

Another fruitful example presented at the workshop was the Helsinki Digital Humanities Hackathons. The event, which is in its third edition, brings together researchers from computer science, humanities and social sciences for a week of intensive work sharing a diversity of skills. Eetu Mäkelä, one of the organisers of the DHH, demonstrated that it is possible to engage researchers from very different backgrounds and have them working in a complementary way. The impressive results of last year’s edition can be checked out at the DHH16 website.

At the end of two profitable days, Darja Fišer, director of CLARIN-ERIC User Involvement, wrapped up the event by presenting other amazing experiences across several institutions connected to CLARIN. One of the success stories she mentioned was the Corpus Linguistics: Method, Analysis, Interpretation MOOC offered by CASS, which will be running again in Autumn this year (you can register your interest here!). Darja also highlighted the importance of events such as summer schools to reach out to more users. Indeed, Darja shared some incredible resources and insightful ideas at our recent Summer Schools in Corpus Linguistics and other Digital methods (#LancsSS17). Make sure you read our next blog post for a summary of the summer school week!

Is this the way to do Corpus Linguistics? Feedback system for the Corpus Linguistics MOOC

Corpus linguistics (CL) is a set of incredibly versatile methods of language analysis applicable to a number of different contexts. So, for example, if you are interested in language, culture, history or society, corpus linguistics has something to offer. Today, thanks to the amazing development in computer technology, corpus linguistic tools are literally only a mouse click away or a touch away, if you are using a tablet or a smartphone. Are you then ready to get your hands dirty with computational analysis of large amounts of language? If the answer is yes, you have probably already registered for the new massively open online course (MOOC) on Corpus Linguistics, created and run by Tony McEnery and other members of the CASS team. (If you haven’t managed to register yet, you can still do so at the FutureLearn website. The course kicks off on 27th January 2014.)

An essential part of the Corpus Linguistics MOOC is its unique feedback system. You will be given a question, a data set and a software tool, and you will be asked to apply what you have learnt in the MOOC lectures to real language analysis. You will explore a topic using corpus techniques which will enable you to uncover interesting patterns in language data. We have a range of topics in store for you. These include English grammar, British and American language and culture, historical discourse of 17th century news books and learner language. But don’t worry, we won’t ask you to write an essay on the topic. Instead, we will give you a number of analyses and descriptions of the corpus data and you will decide which ones use the corpus techniques correctly. After you’ve made your decisions we will provide detailed comments on each of the options. In this way, the CASS Corpus Linguistics MOOC system aims to promote independent learning so that next time you can apply the corpus tools with confidence to answer your own questions.