Latest news on the CASS/iCourts collaborative investigation into the language of the law

Earlier this year, a formal collaboration between iCourts and CASS was signed based on our centres’ joint interest in the corpus-based investigation of language in the context of law. We are motivated to analyse legal data linguistically, because law is practiced in language, legal judgements are texts, legal arguments are phrases in texts, and legal concepts are expressed in words. One primary argument against analysing legal language from a linguistic perspective is that the data tend to be extremely formulaic and objective. However, findings from our collaborative analyses have shown that legal language shows elements of both fixedness and variation. Both sorts of patterns were exposed using corpus-based critical approaches to language.


Sigrun Larsen (Dept. of Law, Lancaster University), Matt Fisher (Tripod Software), Ioannis Panagis (iCourts, University of Copenhagen), Anne Lise Kjær (iCourts, University of Copenhagen), Amanda Potts (CASS, Lancaster University), Tony McEnery (CASS, Lancaster University), Henrik Stampe Lund (iCourts, University of Copenhagen), Paul Rayson (CASS, Lancaster University), Laurence Anthony (CASS visitor, Waseda University)

On our first collaborative project, “Decoding the rule of law: Corpus-based discourse analysis of the construction of achievements of the International Criminal Tribunal for the Former Yugoslavia (ICTY)”, I serve as P.I., collaborating with C.I. Anne Lise Kjær of iCourts. This month, I traveled to Copenhagen to spend 1.5 intensive weeks working at the University of Copenhagen. I arrived prepared to work with two corpora that had previously been collected and cleansed with the help of Matt Fisher (Tripod) and Ioannis Panagis (iCourts): 1) All of the trials and appeals published thus far by the ICTY (10.5 million words); and 2) Annual reports published by the ICTY from 1994-2013 (425,000 words).

In the use of frequency lists, (contrastive) collocation analysis, n-gram description, and key semantic domain analysis, we have demonstrated the ways in which legal language remains rigid and fixed, and also described instances in which variation occurs. Because trials (and, to a lesser extent, appeals) are intended to be self-contained documents, we have also been able to trace problematisation in variations of legal language, which led to confusion in the court, and increased time and money spent in search of justice.

Analysis on the first phase of our project is now complete, and initial results are being disseminated. I presented findings with my collaborator Anne Lise Kjær last week at the fifth international conference for Critical Approaches to Discourse Analysis across Disciplines (CADAAD) at ELTE (Loránd Eötvös University) in Budapest, Hungary. A paper outlining our recommendations for corpus-based critical analyses of legal language and featuring detailed findings of this initial study is in the final stages of preparation, and will be available next year.

iCourts and CASS formalise collaboration, begin first joint project

Last week, I had the honour of returning to iCourts, a centre of excellence for international courts dedicated to investigating the role of international courts in globalising legal order, as well as their impact on politics and society. iCourts is funded by the Danish National Research Foundation and located at the University of Copenhagen. During this visit, I signed the Memorandum of Agreement that iCourts and CASS have concluded, formalising collaboration between the two centres based on our joint interest in the corpus investigation of language in the context of law.


The first joint CASS-iCourts research project, “Decoding the rule of law: Corpus-based discourse analysis of the construction of achievements of the International Criminal Tribunal for the Former Yugoslavia” has just received funding from the University of Lancaster’s Research Committee under the ESRC’s Radical Futures in Social Science programme. On this project, Anne Lise Kjær (associate professor, iCourts) and I (senior research associate, CASS) are digitising U.N. documents, updating semantic lexicons to deal more effectively with field-specific terminology, and then analysing millions of words of legal language to investigate constructions of abstractions such as ‘the truth’. Some of our findings will be presented at CADAAD 2014 in Budapest, in a paper titled: “Key semantic domain analysis as a method of exploring underlying ideologies and self-representation strategies in legal texts”.

While I was at iCourts, I also delivered a three-hour workshop on the topic of “Corpus Tools in Legal and Social Science Research”. The first hour of this was dedicated to a lecture introducing fundamental techniques in corpus linguistics, suggesting ways in which these might be helpful in analysis of legal texts. Three tools developed at Lancaster University – CQPweb, Wmatrix, and VARD – were introduced, and a walkthrough was provided demonstrating their basic functionalities. In the second hour, participants partook in guided exercises with provided data sets of legal language, designed to familiarise them with the tools and techniques. In the final hour, participants were welcomed to either continue on the advanced sections of the guided exercise, or to begin work on their own data. At this point, several participants were guided in the installation of AntConc, an additional tool that has been updated in association with the corpus linguistics MOOC and optimised for learners. Using this tool, attendees were able to experiment with techniques in non-English data, specifically in Norwegian, Spanish, and French.

Check back periodically for the latest developments on the collaboration between iCourts and CASS, and for early access to research findings.