A criminologist’s introduction to AntConc and concordance analysis

My name is Julian Hargreaves (j.hargreaves2@lancaster.ac.uk) and I’m a newcomer to these parts: a non-linguist and an outsider. Okay, the last bit is a slight exaggeration. I’m a member of the CASS Challenge Panel (an advisory board within CASS) representing post-graduate students from disciplines other than linguistics. I’m also a PhD student at the Lancaster University Law School where my research employs a mixture of quantitative and qualitative methods to study criminology, hate crime, British Muslim communities, and the concept of Islamophobia.

Recently, thanks to Professor Tony McEnery and the CASS team, I was introduced to some research tools for linguistics: a piece of software called AntConc and a research method known as concordance analysis. Before the linguistic experts amongst you start groaning, a quick health warning: I’m afraid what follows here may be of little use to those familiar with these basic tools. However, it is hoped that newcomers and non-linguists will be persuaded to approach, without anxiety, both the software and the research methods described below.

My first task was to collect some textual data for analysis undertaken in an introductory session. I chose three years’ worth of press releases from Muslim organisations: organisations that campaign for and represent the interests of British Muslim communities. I use the term ‘Muslim organisations’ with caution and as short-hand only for the above definition. I assume that these organisations are staffed predominantly by Muslims but take care not to presume that this is necessarily the case. Organisations selected for examination included the Muslim Council of Britain, the Islamic Commission for Human Rights, and Cageprisoners.

My interests were in answering research questions related to how British Muslim communities are constructed within press releases, how issues around those communities are described, and whether these constructions and depictions challenge or support those found within scholarly literature concerning British Muslims. Press releases were chosen as a source of textual data as they are accessed easily online and tend to cover a broad range of topics (from political campaign announcements to messages offering seasonal greetings). This accessibility and variety means they may be used collectively as a barometer to measure attitudes and opinions within the organisations that produce them. Another advantage of using press releases is that the selected texts are usually short (often less than 200 words). This allows for a manageably-sized pool of data from a period of time long enough to discern reliably trends and patterns within the data. Also, and most importantly for this beginner, the language used in press releases is very often clear, succinct and thus largely unambiguous. Given all of these characteristics, it was hoped that the press releases would therefore provide small, useful chunks of data capable of allowing research questions to be answered.

The collection of text simply involved cutting and pasting substantive paragraphs from the press releases (headings, names, addresses and telephone numbers were ignored). Extracts were collected in a Word document and this was then converted into a .txt file (all very straight-forward thanks to help from Amanda Potts in CASS). The collection of data resulted in a body of text just over 45,000 words long and corpus-like in appearance. Was it a corpus in the strictest sense of the word? I’ll have to defer to a linguist on that one! The text was analysed using some of AntConc’s basic functions (including Keyword List, Collocates, and Clusters). These functions were accessed with drop-down menus and clearly designed buttons: AntConc is as easy to operate as Word.

Analysis of the text revealed some interesting and somewhat surprising findings. AntConc allowed us to observe the usage of repeated stock phrases throughout much of the text. This suggested a rather formulaic style; maybe even a ‘cut and paste’ approach to the business of writing. We observed a strong tendency by the authors to describe British Muslim communities as a single homogenous group using reoccurring, rather clichéd phrases. For instance, ‘the British Muslim community’ was repeated three times as often as ‘Muslim communities’. Also repeated frequently were phrases such as ‘the vast majority of Muslims’ and ‘Muslims up and down the country’. These findings confound certain conclusions within the literature (e.g. the Runnymede Trust report Islamophobia: A Challenge to Us All) which argue that many Muslim scholars and commentators favour discussion which emphasises the plural and diverse nature of Muslim communities. Further, there is some suggestion that discourse that does otherwise risks being interpreted as methodologically weak, or worse, as a form of prejudice. Here is some evidence to challenge these assertions. The authors of the press releases seemed willing to assert a singular, monist construction of Muslims and Muslim communities: British Muslims are frequently described as belonging to a single group. Perhaps these singular constructions allow the authors to position themselves more easily as being representative of all Muslims in the UK.

AntConc is able to compare the frequency of specific words within the chosen text with the frequency of their appearances in another corpus, usually of more general English (a function known as ‘Keywords’). The use of keywords enables researchers to observe characteristics peculiar to the text being examined. We were alerted to the unusual frequency of the word ‘and’. You may well be thinking – So what?. However, this seemingly mundane finding led to some very interesting insights although they are described here rather tentatively. A more robust examination of keywords would have been achieved by comparing the chosen text with a larger corpus than the one used for my introductory session. Notwithstanding this methodological shortcoming, the frequency of ‘and’ was deemed by the AntConc software to be statistically significant using a log-likelihood value. In plainer English, the observed frequency of the word ‘and’ represented a real difference between the chosen text and general English: a real and recognisable characteristic of the text rather than something occurring by chance alone. Further investigation revealed that ‘and’ occurred frequently as part of the phrase ‘racism and Islamophobia’. This finding prompted reflection upon the nature of the word ‘Islamophobia’ and its usage in press releases by Muslim organisations. The authors’ reliance on the phrase ‘racism and Islamophobia’ may be related to the conceptualisation of anti-Muslim prejudice and hostility. But why position these two concepts together? Why not use another concept like anti-Semitism and Islamophobia? Or why not just use racism? The authors may wish to construct and position the concept of Islamophobia as a distinct and unique phenomenon whilst simultaneously triggering within the reader commonly-held understandings of racism and anti-racism discourse. Use of these words together may well act as a kind of discursive short-cut. Perhaps in this context the concept of Islamophobia is allowed to borrow meaning from the concept of racism, conjuring perhaps such imagery as the black civil rights struggle in sixties America or the Stephen Lawrence inquiry in the UK. AntConc seemed able to help answer existing research questions but also seemed to invite the formulation of future questions and research.

To conclude, I would highly recommend using AntConc and incorporating some element of concordance analysis to anyone wishing to analyse text. Both software and research method are powerful additions to any toolkit being used to research topics within the social sciences and law. And thankfully, they are straight-forward; straight-forward enough even for criminology PhDs!