The Spoken BNC2014 is now available!

On behalf of Lancaster University and Cambridge University Press, it gives us great pleasure to announce the public release of the Spoken British National Corpus 2014 (Spoken BNC2014).

The Spoken BNC2014 contains 11.5 million words of transcribed informal British English conversation, recorded by (mainly English) speakers between the years 2012 and 2016. The situational context of the recordings – casual conversation among friends and family members – is designed to make the corpus broadly comparable to the demographically-sampled component of the original spoken British National Corpus.

The Spoken BNC2014 is now accessible online in full, free of charge, for research and teaching purposes. To access the corpus, you should first create a free account on Lancaster University’s CQPweb server (https://cqpweb.lancs.ac.uk/) if you do not already have one. Once registered, please visit the BNC2014 website (http://corpora.lancs.ac.uk/bnc2014) to (a) sign the corpus’ end-user licence and (b) register your CQPweb account – following the instructions on the site. When you return to CQPweb, you will have access to the Spoken BNC2014 via the link that appears in the list of ‘Present-day English’ corpora. While access is initially only via the CQPweb platform, the underlying corpus XML files and associated metadata will be available for download in Autumn 2018.

The BNC2014 website also contains lots of useful information about the corpus, and in particular a downloadable manual and reference guide, which will be available soon. Further information, as well as the first research articles to use Spoken BNC2014 data, will be available in two in-press publications associated with the project: a special issue of the International Journal of Corpus Linguistics (due next month) and an edited collection in the Routledge ‘Advances in Corpus Linguistics’ series (due early 2018).

The BNC2014 does not end here – we are currently working on transcribing materials provided to us by the British Library to provide a substantial supplement to the corpus – find out more about that here: https://cass.lancs.ac.uk/?p=2241. For now, we will be waiting and watching with interest to see what work the corpus releases today stimulates. As ever with corpus data, it does not enable all questions to be answered, but it does allow a very wide range of questions to be investigated.

The Spoken BNC2014 research team would like to express our gratitude to all who have had a hand in the creation of the corpus, and hope that you enjoy exploring the data. We are, of course, keen to hear your feedback about the corpus; this, as well as any questions, can be directed to Robbie Love (r.m.love@lancaster.ac.uk) or Andrew Hardie (a.hardie@lancaster.ac.uk).

Introducing a new project with the British Library

Since 2012 the BBC have been working with the British Library to build a collection of intimate conversations from across the UK in the BBC Listening Project. Through its network of local radio stations, and with the help of a travelling recording booth the BBC has captured many conversations of people, who are well known to one another, on a range of topics in high quality audio.

For the past two years we have been discussing with the BBC and the British Library the possibility of using these recordings as the basis of a large scale extension of our spoken BNC corpus. The Spoken BNC2014 has been built so far to reflect language in intimate settings – with recordings made in the home. This has led to a large and very useful collection of data but, without the resources of an organization such as the BBC, we were not able to roam the country with a sound recording booth to sample language from John o’Groats to Land’s End! By teaming up with the BBC and British Library we can supplement this very useful corpus of data, which is strongly focused on a ‘hard to capture’ context, intimate conversations in the home, with another type of data, intimate conversations in a public situation sampled from across the UK.

Another way in which the Listening data should prove helpful to linguists is that the data itself was captured in a recording studio as high quality audio recordings. Our hope is that a corpus based on this material will be of direct interest and use to phoneticians.

We have recently concluded our discussion with the British Library, which is archiving this material, and signed an agreement which will see CASS undertake orthographic transcription of the data. Our goal is to provide a high quality transcription of the data which will be of use to linguists and members of the public, who may wish to browse the collection, alike. In doing this we will be building on our experience of producing the Trinity Lancaster Corpus of Spoken Learner English and the Spoken BNC2014.

We take our first delivery of recordings at the beginning of March and are very excited at the prospect of lifting the veil a little further on the fascinating topic of everyday conversation and language use. The plan is to transcribe up to 1000 of the recordings archived at the British Library. We will be working to time align the transcriptions with the sound recordings also and are working closely with our strong phonetics team in the Department of Linguistics and English Language at Lancaster University to begin to assess the extent to which this new dataset could facilitate new work, for example, on the accents of the British Isles.

Our partners in the British Library are just as excited as we are – Jonnie Robinson, lead Curator for Spoken English at the British Library says ‘The British Library is delighted to enable Lancaster to make such innovative use of the Listening Project conversations and we look forward to working with them to make the collection more accessible and to enhance its potential to support linguistic and other research enquiries’.

Keep an eye on the CASS website and Twitter feed over the next couple of years for further updates on this new project!

Remembering Richard Xiao, 1966-2016

I first met Richard in 2000, when he came to Lancaster to be my PhD student. Interested initially in doing a PhD in the area of translation studies, I spoke to him about corpus research and, slowly as the months passed, he decided to use corpora to look at an interesting issue in linguistics – aspect. This was the first of many areas where we happily worked together. Over weeks and months we slowly worked on the problem of integrating corpora and theory, finally arriving at what we both felt was a very satisfactory outcome: a PhD for Richard, a book we wrote on the topic and one or two nice papers.

Early on Richard showed real promise as a researcher so, as I often do with my students, I set Richard onto a few side projects which we pursued together. The first project we worked on was on the F-word in English. I had analysed bad language in the spoken and written BNC, but my book on swearing in English only used the spoken material. So we worked together on the written data and produced the paper ‘Swearing in Modern British English’ which was published in Language and Literature.

That started something of a wave of publications from us – we worked together very well. We had similar interests and personalities, but, most importantly, we felt very comfortable about disagreeing with one another. Those disagreements were always purely intellectual – a cross word never passed between us. They were also not fruitless – we would always debate the point until one or the other of us would change our minds. Working with Richard was a pleasure.

On finishing his PhD Richard started to work as my research assistant. Courtesy of a grant from the UK ESRC we carried on our work on the grammar of Chinese. When I went on secondment from Lancaster University to the UK AHRC, I continued to work with Richard who remained my research assistant. Without Richard working pretty independently of me most of the time while I was on secondment, my time at the AHRC would have been much tougher. As it was, I could focus on the research council work during the day and then check in with Richard in the evening to see how our work was going. The end result was a series of papers on Chinese grammar that I am very proud to be associated with and the book Corpus-Based Contrastive Studies of English and Chinese.

After the grant we were working on finished we hit a snag – we had a very interesting project on Chinese split words lined up, but as I was working for the research council at that time I could not apply to them for a grant and as a research assistant Richard was ineligible to apply. So we wrote the proposal and persuaded our colleague Anna Siewierska to take on the supervisor role on the project. The project was funded and Richard and Anna worked together very well, though I will always regret not being able to be part of that work as it is so interesting. Look at this paper, for example:

http://www.sciencedirect.com/science/article/pii/S0388000109000564

Around the time that this grant was awarded Richard got his first lecturing position at the University of Central Lancashire, moving on to Edge Hill University and finally, to my delight, in 2012 he moved back to Lancaster University, where he was swiftly promoted to Reader.

In the fourteen years from when I first met him to the point where he retired on ill health grounds, if Richard had only done the work described above he would have had a good career. However, he did so much more as his Google Scholar profile shows:

https://scholar.google.com/citations?user=FKclJsYAAAAJ&hl=en

In addition to what he did with me, he also undertook a great range of excellent research on his own, especially in the area of translation studies. Importantly, he contributed to the construction of a wide range of corpora of Mandarin Chinese as can be seen here:

http://www.fass.lancs.ac.uk/projects/corpus/Chinese/

I was delighted when Richard successfully applied to become a British citizen and was honoured to be asked to support his application. I was so pleased to be able to help Richard, his wife Lyn and his daughter in this way.

Sadly, Richard was diagnosed with cancer in 2013. Through surgery, chemotherapy and sheer will power he survived to the 2nd January 2016. The length of his illness, while distressing, did allow us time to publicly celebrate his work:

https://cass.lancs.ac.uk/?p=1672

Throughout his illness he was unfailingly cheerful and optimistic. He was also still brimming with ideas – he was writing and undertaking journal and research council reviews until a few months before he left his suffering behind. I have no doubt that if he had survived longer he would have written many more books and papers well worth reading. As it was, when we last spoke together, just before Christmas 2015, we had a lovely time remembering what we had achieved together. Indeed this brief remembrance of Richard contains many of the things we recalled in that conversation. One thing we did was to decide upon our favourite three publications that we had written together. It seems appropriate to share these in his memory – we both thought they were well worth a read! They are:

McEnery, A. M. & Xiao, R. Z. (2004) ‘Swearing in modern British English: the case of fuck in the BNC.’ Language and Literature. 13, 3, pp. 235-268.

(http://www.academia.edu/2997462/Swearing_in_modern_British_English_the_case_of_fuck_in_the_BNC)

McEnery, A. M. & Xiao, R. Z. (2005) ‘HELP or HELP to: What do corpora have to say?’ English Studies. 86, 2, pp. 161-187.

(http://www.lancaster.ac.uk/fass/projects/corpus/ZJU/xpapers/Xiao_help.pdf)

Collocation, semantic prosody and near synonymy: A cross-linguistic perspective.

Xiao, R. Z. & McEnery, A. M. (2006) Applied Linguistics. 27, 1, pp. 103-129.

(http://www.academia.edu/2997472/Collocation_semantic_prosody_and_near_synonymy_A_cross-linguistic_perspective)

We spent a pleasant time discussing these papers and then we said farewell to each other. I can imagine no better a final conversation between two scholars and friends who worked together so well. I am so happy that we had the chance to have this final meeting of minds. Not only will it be a precious memory for me, I know that it meant a great deal to him. I will miss Richard very much, as will others. However, through his writing his thoughts will live on and as further studies are produced by others on the basis of his corpora, the energy, kindness and ingenuity of Richard Xiao will blaze forth afresh.

25th Anniversary Conference for the Muslim News

muslimnews0I was honoured to attend the 25th Anniversary Conference for the Muslim News on the 15th September. The event was organized by the Society of Editors and the Daily Telegraph had provided the venue – the spectacular Merchant Taylor’s Hall in the City of London. The event began with a speech by the Bob Satchwell, Executive Director of the Society of Editors and a welcoming speech by Lord Black of the Telegraph Media Group. Following that, Fatima Manji of Channel 4 News introduced me and I gave the morning’s keynote speech discussing the work which I did with Paul Baker and Costas Gabrielatos (Discourse Analysis and Media Attitudes, The Representation of Islam in the British Press) looking at the representation of Islam and Muslims in the UK press. I was also very happy to be able to present some early findings from a follow up study Paul Baker and I are currently doing, supported by CASS and the Muslim NGO MEND, looking at how things have developed since our work was published. This is based on approximately 80 million words more data composed of all UK national newspapers articles mentioning Muslims and Islam in the period 2010-2015.

The audience included a mixture of journalists, newspaper editors and TV news reporters and editors. In addition there were representatives from many faith groups and NGOs present too. The research was very well received by the audience. After the talk a panel was convened to discuss the work and take questions from the audience. The panel included John Wellington, the managing editor of the Mail on Sunday, Doug Wills, managing editor of the London Evening Standard and the Independent group of newspapers and Sue Ryan, former managing editor of the Daily Telegraph and manager of the trainee programme for the Mail group. It was a real privilege to be able to discuss our work with them and I found them to be open to criticism and ready to consider change. One point that emerged from the discussion that was of interest, I thought, was that the press are often criticized for their use of language when that usage is current in general English. While this puts the press in the spotlight, it also means that at times they can be in the vanguard of discussion and change in language use, as the recent discussion of the use of the word ‘migrant’ in the UK media has shown. This makes an engagement with media language all the more important for academic researchers.

Following this panel was a second panel, chaired by Fatima Manji, composed of the editors of ITN news and BBC news (Robin Elias and James Stephenson) as well as Channel 4’s Home Affairs correspondent Simon Israel. Julian Petley, author of Pointing the finger: Islam and Muslims in the British media, gave academic weight to this panel’s discussion. A very thought provoking discussion ensued about how to achieve a more inclusive and representative newsroom which demonstrated, once again, that the media was willing to engage in discussion and was prepared to embrace change.

 muslimnews1
After lunch the final session, chaired by Ehsan Masood of Research Fortnight, followed a
contribution from Jonathan Heywood of Impress on a Leveson compliant media watchdog that Impress are developing. A lively debate followed led by the head of IPSO, Sir Alan Moses. Sir Alan was joined by prominent editors from The Sunday Times (Eleanor Mills) and The Observer (Stephen Pritchard) as well as the Managing Editor of the London Evening Standard and Independent Group, Will Gore. A key tension that was highlighted by Sir Alan Moses in the debate was between what in principle may be desirable and what is achievable in reality. He also made the important point that we have to decide as a society where we want regulation to end and a softer form of social regulation to begin. I finished the afternoon with a brief and rewarding discussion of my work with Sir Alan.

The event was a rare and precious opportunity to showcase academic research to a range of key stakeholders and for that opportunity I am very grateful both to MEND and to Muslim News.

Corpus linguistics MOOC: Second run beginning soon

We are running the corpus MOOC again – and we are really looking forward to it. In the first run of the course we taught social scientists and other researchers from across the globe about how to use corpus linguistics to study language. We looked at a range of topics of contemporary social relevance in doing so – including how we talk about disability and how newspapers write about refugees. We also looked at key areas where corpus linguistics has contributed greatly, notably the areas of dictionary construction and language teaching.

The result, I must say, exceeded our expectations – which were pretty high. People really seemed to like the course and get a lot from it. Even though the approach was entirely new to most students, a very large number worked through all eight weeks of the course. The feedback on our training has been exceptionally strong – a look at the #corpusMOOC hashtag on Twitter will give a good idea of the overwhelmingly positive response to that course. The following quote, from a Chinese notice board on which our MOOC was discussed, gives a strong sense of how the course succeeded both in training students and in showing them that corpora have a key role to play in exploring social science questions (thanks to Richard Xiao for the translation):

“CorpusMOOC, with its assembly of the best corpus linguists and rich content, cannot be praised enough … The greatest benefit for me has been that the course has widened my vision: corpus linguistics and the applications of corpus technologies have gone far beyond what I had imagined – more resembling big data in the field of social science research instead of being confined to linguistics… I think the significance of this course lies not merely in teaching a large number of corpus techniques but more, rather, in introducing corpora and demonstrating what corpora can be used for, thus making us aware of them and helping us understand their importance … the corpus-based approach is the unavoidable approach to language in future.”

The first run of the MOOC had a great impact – the course was taken mainly by women (70.44% of students), and drew participants from all continents and a wide range of countries – including places as far flung as the British Antarctic Territory! The areas in which course participants were working and researching were heavily oriented to the social sciences, with students drawn from areas such as business consulting and management, health and social care and media and publishing. The greatest contribution of the course, however, seems to have come from providing training to teachers/lecturers in the UK and beyond. Given that the great majority of students were taking the course for career development (78.59%), the course was likely not only to have had a strong effect on this group but also, by extension, on the students who are exposed to the ideas in the course by the teachers/lecturers who took it.

Having read this, you can probably understand why we were keen to run the course again. Through it we have been able to get a good understanding of corpus linguistics across to thousands of people around the globe. We have made a few changes to the course based on the feedback we received – all designed to make a good course better! This includes new lectures (for example on the language used in cancer treatment) and new in conversation pieces with corpus linguists (such as Douglas Biber).

If this run of the course proves as popular as the first, which we think it should, we plan to run the course every September. Who knows when we will stop!

For a limited time, registration is still open. Book your place on ‘Corpus linguistics: method, analysis, interpretation’ now. 

Spoken BNC2014 project announcement

BNC2014 logo

We are excited to announce that the ESRC-funded Centre for Corpus Approaches to Social Science (CASS) at Lancaster University and Cambridge University Press have agreed to collaborate on the compilation of a new, publicly accessible corpus of spoken British English called the ‘Spoken British National Corpus 2014’ (the Spoken BNC2014).

The aim of the Spoken BNC2014 project, which will be led jointly by Lancaster University’s Professor Tony McEnery and Cambridge University Press’ Dr Claire Dembry, is to compile a very large collection of recordings of real-life, informal, spoken interactions between people whose first language is British English. These will then be transcribed and made available publicly for a wide range of research purposes.

We aim to encourage people from all over the UK to record their interactions and send them to us as MP3 files. For each hour of good quality recordings we receive, along with all associated consent forms and information sheets completed correctly, we will pay £18. Each recording does not have to be 1 hour in length; participants may submit two 30 minute recordings, or three 20 minute recordings, but for each hour in total, they will receive £18.

The collaboration between CASS at Lancaster University and Cambridge University Press brings together the best resources available for this task. Cambridge University Press is greatly experienced at collecting very large English corpora, and it already has the infrastructure in place to undertake such a large compilation project. CASS at Lancaster University has the linguistic research expertise necessary to ensure that the spoken BNC2014 will be as useful, and accessible as possible for a wide range of purposes. The academic community will benefit from access to a new large spoken British English corpus that is balanced according to a selection of useful demographic criteria, including gender, age, and socio-economic status. This opens the door for all kinds of research projects including the comparison of the spoken BNC2014 with older spoken corpora.

CASS at Lancaster University and Cambridge University Press are very excited to launch the Spoken BNC2014 project, and we look forward to sharing the corpus as widely as possible once it is complete.

To contribute to the Spoken BNC2014 project as a participant please email corpus@cambridge.org for more information.

Further explorations in ‘the Muslim world’

Doing a ten minute presentation is pretty tough – you have to be equally ruthless about what you leave out and what you include. But the benefits are potentially great – if you can present an idea well in ten minutes you are pretty sure that you will have your viewer’s attention. As anybody who has lectured knows, with longer talks, no matter how strong your delivery, attention starts to wander for some in the audience as the talk progresses! So when I had the opportunity to do a talk of 10-18 minutes for Lancaster TEDx, I immediately went for the option of 10 minutes. It was a nice challenge for me and I thought that the brevity of the talk would help me to get my message across. So I beavered away for a few weeks putting things in and taking things out, thinking about key messages and marshalling my data: if my TEDx talk looks spontaneous …. it was not. In fact I imagine few of them really are, in spite of them being presented in such a way as to make it appear that they are. A lot of work goes into them – and that is just from the speakers. The crew who organized and filmed the event at Lancaster worked amazingly hard as well.

So was it worth it? Well, I have had many kind notes since I did the talk thanking me for it. I have also had a fair number of views of my talk on-line and many, many more likes than dislikes. So for me the answer is an emphatic ‘yes’, it was worth it. Many thanks to all who have viewed and publicised my talk.

Reading the comments has been an interesting experience – many are appreciative. Yet some simply show that some of the argument was ignored or not picked up by the watcher – so a watcher asks if religious identity is important to athletic performance in response to a point I make about the failure of the UK press to report on Mo Farrah’s Muslim identity. Though I thought I made it clear that that identity is one Farrah himself says is central to his athletic achievements and hence, yes, it is relevant, it seems that perhaps my optimism that a ten minute talk would deal with attention span issues was misplaced! For some of these mistaken queries other commenters set the record straight, which is kind of them.

Of slightly more interest are some of the questions that get thrown up – I will consider three here. Firstly: what about the term the West? I was glad this was picked up by a viewer as we discuss that in the book that my talk is based upon (Baker, Gabrielatos and McEnery, 2012:131-132). As a self-referential term it does have a role to play in setting up the ‘us’ that is opposed to the ‘them’ of the Muslim world. Another viewer asks whether Muslim world is just a neutral term used to define a culturally homogeneous region. This is a dangerous argument. It takes us to the precipice of the very ‘us and them’ distinction I was discussing. It is dangerous precisely because it is simplistic in nature, as it implies an homogeneous and distinct other (there are non-Muslims who live in the so-called Muslim world, for example – the area referred to is not homogeneous in oh so many ways). It also misses the point – if this was a simply neutral referring expression perhaps the ‘us and them’ distinction would not be so powerful. The problem is it is a very powerful term for generating an ‘us and them’ distinction because it sets Muslims in opposition to non-Muslims in the language and, as noted, it homogenizes Muslims  – they are all the same and the reporting of the views of the Muslim world entrench this monolithic view also (see Baker, Gabrielatos and McEnery, 2012:130). Finally, the same viewer wonders why I did not talk about the change of meaning of words over time. The answer to that one is easy – sadly, as shown in the later part of the talk, the attitudes I was talking about have not changed over time, even though I would have been happy to say that they had if this was true. The viewer also uses the word ‘gay’ as an interesting example of change in meaning over time – well, that would have been another talk to give. A lot of nonsense is spoken about this world – it is usually presented as a word that had a simple, innocent, meaning until another, less innocent meaning came along and spoilt it, a view hilariously lampooned by Stephen Fry and Hugh Laurie in this sketch:

However, this is not true – gay had far from innocent meanings in the past – a quick perusal of Jonathan Green’s excellent Chambers Slang Dictionary shows that. So yes, a discussion of word meaning change over time would have been interesting and debunking a few myths about the word gay would have been fun too – but that was not what my talk was about, so I shall leave the matter there. Maybe for a future TEDx? Who knows.

So – ten minute talks have their pluses and minuses. They are great for getting your message out and, by and large, I am happy with how my talk went. I found the experience of giving a TEDx talk a very positive one and many other people clearly enjoyed it also.  Best of all, it has made people think about and discuss their use of language, and that is something which always pleases me!

Watch my full TEDxLancasterU talk here:

John Sinclair lecture: “Primed for Violence? A corpus analysis of jihadist discourse”

It was a great honour to give this year’s Sinclair Lecture at Birmingham University. I have long been an admirer of John’s work – there are many ideas he developed that are well worth critically engaging with. So to be asked to give a talk in his memory and honour was a challenge I happily took on. The topic of the talk I chose carefully – John liked ground breaking work and was a producer of daring and new ideas. So I thought an off-the-shelf piece of work was not right for this talk – it was more in keeping with the event to give a talk on a piece of work in progress. Having made that decision I then knew I should talk on the work I am developing on language and violence.

Language and violence is, in my view, a terribly under-researched topic. It is also an area which, sadly, has on-going relevance to human society. More positively, it is a topic on which linguists can – and to some extent do – provide insights. This talk was given in that spirit. I aim to show, as one would rightly expect of the ESRC Centre for Corpus Approaches to Social Science, the insights that a corpus approach may provide to an issue which reaches across the social sciences and beyond. In doing so I had to work with some quite challenging data. I think, however, that the results are at least indicative both of how this area of research may be opened up and how  linguists may contribute to its exploration. I think John would have liked this bold new venture and so I feel very comfortable in dedicating this talk to his memory.

Abstract:

‘How are people persuaded to be violent? How might a small group of people influence members of a larger group of people to behave in ways that they may normally find abhorrent? This talk looks at these questions, which are typically summarised as ‘radicalization’, using the example of jihadist language.I will explore how language may be manipulated in order to legitimate violent acts against certain groups or individuals in jihadist materials. However, I will also be exploring the important claim that there is a direct link between what the jihadists write and what other Muslims write, an assumption held by policy makers, academics and the media.

This talk examines how we look for linguistic evidence of this process, with an emphasis upon incitement to violence. If there is evidence that the manipulation of language in jihadist writing leads to a corresponding adaptation in either the Muslim mainstream media or the writing of ordinary Muslims over time, then we may begin to accept and understand with some linguistic sophistication what is at the moment assumed by many. We may also, however, be able to see how such radicalization is resisted, and hence better understand the process of resistance to radicalization also.

Central to my account of incitement to violence are the linked ideas of collocation and lexical priming. Together these begin to explain, I will argue, both the rhetorical process around incitement to violence and the broader dynamics in discourse that alienate and leave open to persuasion sections of society that may be persuaded to undertake violent acts.

My exploration is based on tens of thousands of words of corpus material, including i.) transcripts of so-called ‘martydom’ videos; ii.) texts by those who exhort jihadists to acts of violence; iii.) muslim news media and iv.) comment data from the Muslim news media. By drawing upon a range of sources like this, I will be better able to characterise the competing forces being brought to bear as different groups try to influence mainstream Muslim discourse.’