Dispatch from YLMP2014

YLMP

I recently had the pleasure of travelling to Poland to attend the Young Linguists’ Meeting in Poznań (YLMP), a congress for young linguists who are interested in interdisciplinary research and stepping beyond the realm of traditional linguistic study. Hosted over three days by the Faculty of English at Adam Mickiewicz University, the congress featured over 100 talks by linguists young and old, including plenary lectures by Lancaster’s very own Paul Baker and Jane Sunderland. I was one of three Lancaster students to attend the congress, along with undergraduate Agnes Szafranski and fellow MA student Charis Yang Zhang.

What struck me about the congress, aside from the warm hospitality of the organisers, was the sheer breadth of topics that were covered over the weekend. All of the presenters were more than qualified to describe their work as linguistics, but perhaps for the first time I saw within just how many domains such a discipline can be applied. At least four sessions ran in parallel at any given time, and themes ranged from gender and sexuality to EFL and even psycholinguistics. There were optional workshops as well as six plenary talks. On the second day of the conference, as part of the language and society stream, I presented a corpus-assisted critical discourse analysis of the UK national press reporting of the immediate aftermath of the May 2013 murder of soldier Lee Rigby. I was happy to have a lively and engaged audience who had some really interesting questions for me at the end, and I enjoyed the conversations that followed this at the reception in the evening!

What was most encouraging about the congress was the drive and enthusiasm shared by all of the ‘young linguists’ in attendance. I now feel part of a generation of young minds who are hungry to improve not only our own work but hopefully, in time, the field(s) of linguistics as a whole. After my fantastic experience at the Boya Forum at Beijing Foreign Studies University last autumn, I was happy to spend time again celebrating the work of undergraduate and postgraduate students, and early-career linguists. There was a willingness to listen, to share ideas, and to (constructively) criticise where appropriate, and as a result I left Poznań feeling very optimistic about the future of linguistic study. I look forward to returning to the next edition of YLMP, because from what I saw at this one, there is a new generation of linguists eager to push the investigation of language to the next level.

How to be a PhD student (by someone who just was), Part 2: Managing your work and working relationships

After submitting and successfully defending my thesis a few months ago, I’ve decided to share some ‘lessons learnt’ over the course of my 38 months as a PhD student. 

In Part 2 of this series, I’ll talk about best practices for structuring your work, managing your relationship with your supervisor, and my experience with teaching undergraduates. If you missed “Part 1: Preparing for the programme”, you can read it here


hypothesis

Structuring your work

I believe it’s healthy to treat your PhD—as much as possible—like a job. Like any job, a PhD has physical, social, and temporal boundaries.

Try to create a PhD ‘space’. Make use of your office if you’ve been given one at your university, and create a space within your home that is a ‘work area’ if you haven’t been given one. Working from bed, from the sofa, or from a café means that your PhD is infiltrating all areas of your life. While some degree of this is inevitable, it’s best to keep physical boundaries as much as possible, even if you can only keep it to your desk.

By the same token, making friends outside of your department or your field is helpful in many ways. I adore my friends from Linguistics and I couldn’t have finished my doctorate without them, but you wouldn’t solely hang out with friends from work when you’re at home, and this is the same situation. In a group of people who have a similar background, you might end up talking about your field ‘outside of hours’. This can be stimulating, but also exhausting. You may want to vent about your department, or talk about something other than your PhD or field, even trashy TV! It’s easier with friends from other areas. As a nice extra feature, the connections that you make outside of your field can also help you inside your field. I’ve had very good advice from friends working in statistics, gotten ideas from historians, and been inspired by literary scholars, even though I might never venture into these areas in the library.

If you can, also create a routine for yourself, even if this isn’t 9-5. It’s best if this routine involves physically moving locations, but even if it doesn’t, physically change something: take a shower, get dressed for work. Pick 8 hours within the day that you work best, and work during those hours. Don’t be too hard on yourself if you have a short day or miss days out entirely…a PhD is ‘swings and roundabouts’ as they say around here…it’s long enough that you will make up the time to yourself. As much as possible, take the weekends and holidays off. This might mean working longer than 8 hours on weekdays, but personally, I think it’s worth it. Many people study in a place far from where they grew up, and a PhD is one time in life where you can be flexible enough with your time to enjoy a bit of sightseeing and tourism.

During this routine, set clear goals for yourself. I’ve seen people arguing for and against writing something every day. I found it very helpful to set a daily word count goal for myself, then sit in front of a computer until I at least came close. The number isn’t important: at the start of my PhD, I aimed to write 200 words per day; at the end of my PhD, I was able to write 1,000 words per day. What is important is getting into a routine. You will sit down some days and feel horrible. You’ll have writer’s block. You will struggle through each word of those 200, and know that you’ll delete most of them. But it’s much easier to get 40 great words out of 200 bad ones than to write 40 words completely cold. I’ve written entire chapters three times as long as they needed to be, and hated them. But paring them down is cathartic—it’s like sculpting. The bonus is that when you get into the habit of writing every day, you slowly get into the habit of writing something good every day. Soon, you’ll be writing 100 words and keeping 50 of them. Then you’ll be writing 1,000 words and keeping 900 of them. The important part is keeping the pace: just write! Your supervisor will also appreciate having something tangible to mark your progress (see next section).

As far as the structure of my own work, there are three things that I would do differently, if I could do it all again:

  1. Decide on a reference manager and stick to it diligently from Day 1. At the start of my degree I used EndNote for reference management, as this was offered for free by my university and came in both desktop and web versions. For my whole first year, I used EndNote to create an annotated bibliography—an extremely useful tool when drafting your literature review. However, EndNote began crashing on me, and papers were no longer available. In my second year, I stopped keeping track of references and just kept haphazard folders of PDFs. In my third year, I just used in-line citations, believing that sources would be easy to find later on. Not true! The month before submission I decided to make the leap to Mendeley, a truly amazing (free) reference manager that allows you to build and share libraries, store your PDFs, search other people’s collections, and select from a vast array of output styles (I favour APA 6th edition). The transition was extraordinarily painful. Exporting from Endnote was problematic and buggy, scanning PDFs in Mendeley was error-prone, and finding the corresponding works for those in-line references was impossible in some cases. I wasted a solid week just before submission sorting out my references, and this really should have been done all along. It would have been so painless!
  2. Master MS Word early on. In my final year, I finally got serious about standardising the numbering of my tables and figures, which means that in the eleventh hour, I was still panicking, trying to make sure that I had updated everything to the proper styles and made appropriate in-line references to my data. Had I set my styles earlier on and made the best use of MS Word’s quite intuitive counting and cross-referencing mechanisms, I would have saved myself days of close reading. If you are using MS Word (sorry, I can’t say anything about LaTeX) and you are not using the citation manager or cross-reference tool, learn how to do that immediately. Today. Your library might have a class on it, or, like me, you can brush up in an hour of web searching.
  3. Put down the books earlier. At a certain point, you need to generate new research and make a novel contribution to knowledge. Your first year and much of your second year will be dedicated to making sure that a research gap exists, and that you can pay tribute to all of the giants whose shoulders you will be standing on. However, burying yourself in a library for three years reading everyone else’s great works is a good way to paralyse yourself. Of course you will always need to keep up with the times, but a certain point, your rate of writing will overtake your rate of reading. If I could do it again, I would follow a pattern more like this:

readwrite

After the first year, you won’t be missing anything totally fundamental. After the second year, you won’t be missing anything peripheral. If, in the third year, you’ve missed something very fresh, your examiners will point it out. But the more important thing is to make a contribution. Most of the PhD is research, not literature review. Your supervisor will be able to help you with this, and with other things (but not really others), as I discuss below.

Managing your relationship with your supervisor

Continue reading

“My research trip to the CASS centre” by visiting PhD student Anna Mattfeldt

Several times a year, the ESRC Centre for Corpus Approaches to Social Science welcomes visiting researchers, from PhD students to professors. Past visitors include Will Hamlin (Washington State University, USA) and Iuliia Rudych (Albert-Ludwigs-Universität Freiburg, Germany); current visitors include Laurence Anthony (Waseda University, Japan) and Anna Mattfeldt (Heidelberg University, Germany). Before returning to her home university, Anna wanted to share a few thoughts about her experience here at CASS:


I am a PhD student from Heidelberg who has just spent eight wonderful weeks at Lancaster University on a research trip. Before I went, some friends and colleagues asked me why I would go to so much trouble when I could just as easily write my thesis back home in Heidelberg. In the following post, I will try to answer why a research trip to another country and another university was the right decision for me – and why I can absolutely recommend it to other PhD students as well. I would also like to thank my main supervisor, Prof. Ekkehard Felder, for giving me the great chance to spend these eight weeks of research here at Lancaster.

I am doing my PhD at the German department of Heidelberg University. We have been doing corpus linguistic research in discourse analysis for quite some time, with big thematic corpora like HeideKo that were collected for research and teaching purposes. A bilingual corpus project, focusing on the depiction of Europe in German and Hungarian newspapers, is currently under way with the German department of The ELTE in Budapest, Hungary.

We approach data from a mainly qualitative point of view, accompanied by quantitative analysis. We focus on so-called “semantic battles” in a pragma-semiotic approach, which means we try to find instances of disagreement or agreement between speakers and how they are played out on the linguistic surface-level. Some may come up so often in specific discourses that they can be seen as central to the discourse. We are interested in the concepts behind the discourse, and how we can deduce them from the actual linguistic devices used in texts.

In my PhD, I am looking at environmental media discourses (especially concerning Hurricane Sandy and hydraulic fracturing, the so-called “fracking” in the US, the UK and Germany), in order to do a linguistic discourse analysis. Moreover, I am trying to find a way to detect conflictive topics and concepts in the various discourses. So, for a project that focuses different languages, corpora, research questions, I need corpus linguistic software, like WMatrix, AntConc, CQPweb and WordSmith. My co-supervisor, Prof. Busse, recommended a stay with the ESRC Centre for Corpus Approaches to Social Science at Lancaster University. The CASS centre at Lancaster is known for its high scientific expertise with huge corpora and different kinds of software. This is why I came up with the idea to also look for support somewhere else.

Hence I sent an email to Tony McEnery. To my great delight, after sending in a few documents, I was actually invited to come and do some research here. After figuring everything out at work, sending applications for scholarships to fund all this and chatting online with local property owners, I finally arrived on the 15th February and spent eight amazing weeks here.

The CASS centre has helped me a lot in my research, especially with tricky data. I was also confronted with lots of interesting ideas, and I loved the atmosphere of picking one another’s brains and inspiring one another. I liked the working atmosphere, the many interesting talks that were given, and the wonderful library with all the literature of the different fields, and last but not least the beautiful campus in an idyllic landscape. I was inspired to work more closely with quantitative approaches and to see how they could be used to see the bigger concepts “between the lines”. I also got a lot of my analysis done, made a lot of progress and still managed to see a bit of England as well during the weekends.

Thus, I can wholeheartedly recommend going abroad during a PhD for a research trip:

  • You get to talk to experts who can help you find solutions for the challenges you have been stuck with.
  • You get lots of new ideas just by talking to different people, being in a new environment or experiencing a different research philosophy.
  • Believe it or not, it immensely furthers the writing process to work in a new environment without any distractions.
  • If you are going to a country with a different language than your own, it is a great opportunity to brush up your language skills.
  • You broaden your horizons by living abroad, not only as far as your PhD is concerned.

So if you feel that you can profit in any way by going abroad, I recommend you do that – and hopefully come to Heidelberg! If you have any further questions concerning my project or visiting Heidelberg University for your own research trip, just send me an email (anna.mattfeldt at gs.uni-heidelberg.de).


Are you interested in being a visiting researcher/scholar at CASS? Email us at cass(Replace this parenthesis with the @ sign)lancs.ac.uk to discuss research aims and availability.

How to be a PhD student (by someone who just was), Part 1: Preparing for the programme

In December 2013, after three years and two months of work, I submitted my PhD thesis. Last month, I successfully defended it, and made the (typographical) corrections in two nights. I’m a Doctor! It’s still exciting to say.

pottsphdA PhD is certainly not easy — I’ve heard it compared to giving birth, starting and ending a relationship, riding a rollercoaster, making a lonely journey, and more. I relocated across the world from Australia to begin mine, and the start was marked by the sadness of a death in the family. It’s been a whirlwind ever since; throughout the course of my degree, I taught as much as possible, I researched and published outside the scope of my PhD, and in April 2013, I began full-time work in the ESRC Centre for Corpus Approaches to Social Science.

The question that I get most often is a question that I found myself asking for years: how? How do you do a PhD? How do you choose a programme and keep from looking back? How do you keep close to the minimum submission date (or at least keep from going beyond the maximum submission date)? How do you balance work and study? I’d like to share a short series (in three installments) about my degree and my lessons learned. There are many resources out there for people doing PhDs, but I wasn’t able to find any that described my experience. I hope that this might help some others who are [metaphorical representation of your choice] a PhD. Before beginning, I’d just like to stress that these resonate with my personal experience (and with those of many of my friends), but won’t align with everyone’s circumstances.

The first installment is five pointers about what to do when applying to a programme.

Continue reading

Using version control software for corpus construction

There are two problems that often come up in collaborative efforts towards corpus construction. First, how do two or more people pool their efforts simultaneously on this kind of work – sharing the data as it develops without working at cross-purposes, repeating effort, or ending up with incompatible versions of the corpus? Second, how do we keep track of what changes in the corpus as it grows and approaches completion – and in particular, if mistakes get made, how do we make sure we can undo them?

Typically corpus linguists have used ad hoc solutions to these problems. To deal with the problem of collaboration, we email bundles of files back and forth, or used shared directories on our institutional networks, or rely on external cloud services like Dropbox. To deal with the problem of recording the history of the data, we often resort to saving multiple different versions of the data, creating a new copy of the whole corpus every time we make any tiny change, and adding an ever-growing pile of “v1”, “v2” “v3”… suffixes to the filenames.

In this blog post I’d like to suggest a better way!

The problems of collaboration and version tracking also affect the work of software developers – with the difference that for them, these problems have been quite thoroughly solved. Though software development and corpus construction are quite different animals, in two critical respects they are similar. First, we are working mainly with very large quantities of plain text files: source code files in the case of software, natural-language text files in the case of corpora. Second, when we make a change, we typically do not change the whole collection of files but only, perhaps, some specific sections of a subset of the files. For this reason, the tools that software developers use to manage their source code – called version control software – are in my view eminently suitable for corpus construction.

So what is version control software?

Think of a computer filesystem – a hierarchy of folders, subfolders and files within those folders which represents all the various data stored on a disk or disks somewhere. This is basically a two-dimensional system: files and folders can be above or below one another in the hierarchy (first dimension), or they can be side-by-side in some particular location (second dimension). But there is also the dimension of time – the state of the filesystem at one point in time is different from its state at a subsequent point in time, as we add new files and folders or move, modify or delete existing ones. A standard traditional filesystem does not have any way to represent this third dimension. If you want to keep a record of a change, all you can do is create a copy of the data alongside the original, and modify the copy while leaving the original untouched. But it would be much better if the filesystem itself were able to keep a record of all the changes that have been made, and all of its previous states going back through history – and if it did this automatically, without the user needing to manage different versions of the data manually.

Windows and Mac OS X both now have filesystems that contain some features of this automatic record-keeping. Version control software does the same thing, but in a more thorough and systematic way. It implements a filesystem with a complete, automatic record of all the changes that are made over time, and provides users with easy ways to access the files, see the record of the changes, and add new changes.

I personally encountered version control software for the first time when I became a developer on the Corpus Workbench project back in 2009/2010. Most of the work on CWB is done by myself and Stefan Evert, and although we do have vaguely defined areas of individual responsibility for different bits of the project, there is also a lot of overlap. Without version control software, effective collaboration and tracking the changes we each make would be quite impossible. The whole of CWB including the core system, the supplementary tools, the CQPweb user interface, and the various manuals and tutorials, is all version-controlled. UCREL also uses version control software for the source code of tools such as CLAWS and USAS. And the more I’ve used version control tools for programming work, the more convinced I’ve become that the same tools will be highly useful for corpus development.

The version control system that I prefer is called Subversion, also known by the abbreviation SVN. This is quite an old-fashioned system, and many software developers now use newer systems such as Mercurial or Git (the latter is the brainchild of Linus Torvalds, the mastermind behind Linux). These newer and much more flexible systems are, however, quite a bit more complex and harder to use than Subversion. This is fine for computer programmers using the systems every day, but for corpus linguists who only work with version control every now and them, the simplicity of good old Subversion makes it – in my view – the better choice.

Subversion works like this. First, a repository is created. The repository is just a big database for storing the files you’re going to work with. When you access this database using Subversion tools, it looks like one big file system containing files, folders and subfolders. The person who creates and manages the repository (here at CASS that’s me) needs a fair bit of technical expertise, but the other users need only some very quick training. The repository needs to be placed somewhere where all members of the team can access it. The CASS Subversion repository lives on our application server, a virtual machine maintained by Lancaster University’s ISS; but you don’t actually need this kind of full-on setup, just an accessible place to put the database (and, needless to say, there needs to be a good backup policy for the database, wherever it is).

The repository manager then creates usernames that the rest of the team can use to work with the files in the repository. When you want to start working with one of the corpora in the repository, you begin by checking out a copy of the data. This creates a working copy of the repository’s contents on your local machine. It can be a copy of the whole repository, or just a section that you want to work on.  Then, you make whatever additions, changes or deletions you want – no need to keep track of these manually! Once you’ve made a series of changes to your checked-out working copy, you commit it back into the repository. Whenever a user commits data, the repository creates a new, numbered version of its filesystem data. Each version is stored as a record of the changes made since the previous version. This means that (a) there is a complete record of the history of the filesystem, with every change to every file logged and noted; (b) there is also a record of who is responsible for every change. This complete record takes up less disk space than you might think, because only the changes are recorded. Subversion is clever enough not to create duplicate copies of the parts of its filesystem that have not changed.

Nothing is ever lost or deleted from this system. Even if a file is completely removed, it is only removed from the new version: all the old versions in the history still complain it. Moreover, it is always possible to check out a version other than the current one – allowing you to see the filesystem as it was at any point in time you choose. That means that all mistakes are reversible. Even if someone commits a version where they have accidentally wiped out nine-tenths of the corpus you are working on, it’s simplicity itself just to return to an earlier point in history and roll back the change.

The strength of this approach for collaboration is that more than one person can have a checked-out copy of a corpus at the same time, and everyone can make their own changes separately. To check whether someone else has committed changes while you’ve been working, you can update your working copy from the repository, getting the other person’s changes and merging them with yours. Even if you’ve made changes to the same file, they will be merged together automatically. Only if two of you have changed the same section of the same file is there a problem – and in this case the program will show you the two different versions, and allow you to pick one or the other or create a combination of the two manually.

While Subversion can do lots more than this, for most users these three actions – check out, update, and commit – are all that’s needed. You also have a choice of programs that you can use for these actions. Most people with Unix machines use a command-line tool called svn which lets you issue commands to Subversion by typing them into a shell terminal.

On Windows, on the other hand, the preferred tool is something called TortoiseSVN. This can be downloaded and installed in the same way as most Windows programs. However, once installed, you don’t have to start up a separate application to use Subversion. Instead, the Subversion commands are added to the right-click context menu in Windows Explorer. So you can simply go to and empty folder, right-click with the mouse, and select the “check out” option to get your working copy. Once you’ve got a working copy, right-clicking on any file or folder within it allows you to access the “update” and “commit” options. TortoiseSVN provides an additional sub-menu which lets you access the full range of Subversion commands – but, again, normal users only need those three most common commands.

The possibility of using TortoiseSVN on Windows means that even the least tech-savvy member of your team can become a productive use of Subversion with only a very little training. And the benefits of building your corpus in a Subversion repository are considerable:

  • The corpus is easily accessible and sharable between collaborators
  • A complete record of all changes made, plus who-did-what
  • Any change can be reversed if necessary, with no need to manually manage “old versions”
  • Full protection against accidental deletions and erroneous changes
  • A secure and reliable backup method is only needed for the repository itself, not for each person’s working copy

That’s not to mention other benefits, such as the ease of switching between computers (just check out another working copy on the new machine and carry on where you left off).

Here at CASS we are making it our standard policy to put corpus creation work into Subversion, and we’re now in the process of gradually transitioning the team’s corpus-building efforts across into that platform. I’m convinced this is the way of the future for effectively managing corpus construction.

Trinity oral test corpus: The first hurdle

At Trinity we are wildly excited – yes, wildly – to finally have our corpus project set up with CASS. It’s a unique opportunity to create a learner corpus of English based on some fairly free flowing L2 language which is not too constrained by the testing context.  All Trinity oral tests are recorded and most of the tests include one or two tasks where the candidate has free rein to talk about their own interests in their own way – very much their own contributions, expressed as themselves. We have been hoping to use what is referred to as our ‘gold dust’ for research that will be meaningful – not just to the corpus community but also in terms of the impact on our tests and our feedback to learners and teachers. Working with CASS has now given us this golden opportunity.

The project is now up and running and in the corpus building stage and we have moved from the heady excitement of imaging what we could do with all the data to the grindstone of pulling together all the strands of meta data needed to make the corpus robust and useful. The challenges are real – for example, we need to log first languages but how do we ensure reliability? Meta data is now an  opt-in in most countries so how do we capture everyone? Even when the data boxes are completed how do we know it’s true? No, the only way is the very non-technological method of contacting the students again and following up in person.

A related concern is has the meta data we need shifted? We would normally be interested in what kind of input students had had to their learning so e.g. how many years study etc. In the past, part of this  data gathering was to ask about time learners had spent in an English-speaking country. Should this now be shifted to time spent watching videos online in English, in social media, in reading online sources? What is relevant –and also collectable?

The challenges in what might be considered this no-core information is forcing us to re-examine how sure we are about influences on learning – not just our perception but form the learner’s perception as well.

Writing for the press: the deleted scenes

In late July and early August 2013, the stories of Caroline Criado-Perez, the bomb threats, and latterly, the horrific tragedy of Hannah Smith broke across the media, and as a result, the behaviour supposedly known as “trolling” was pitched squarely into the limelight. There was the inevitable flurry of dissections, analyses, and opinion pieces, and no doubt like any number of academics in similar lines of work, I was asked to write various articles on this behaviour. Some I turned down for different reasons, but one that I accepted was for the Observer. (Here’s the final version that came out in both the Observer and the Guardian.)

Like the majority of people, I have been mostly in the dark about how the media works behind the scenes. That said, throughout my time at university, I have studied areas like Critical Discourse Analysis and the language of the media, and over the past three years, my work has been picked up a few times in small ways by the media, so I probably had a better idea than many. I realise now, however, that even with this prior knowledge, I was still pretty naive about the process. I wasn’t too surprised, then, when I got a number of comments on the Observer article raising exactly the sorts of questions I too would have asked before I’d gone through what I can only describe as a steep media learning curve. There were, essentially, three main issues that kept recurring:

(1)    Why didn’t you talk about [insert related issue here]? This other thing is also important!

(2)  Why didn’t you define trolling properly? This isn’t what I’d call trolling!

(3)   Why did you only mention the negative types of trolling? There are good kinds too!

All three questions are interrelated in various ways, but I’ve artificially separated them out because each gives me a chance to explain something that I’ve learned about what happens behind the scenes during the process of producing media content.

Continue reading

Further explorations in ‘the Muslim world’

Doing a ten minute presentation is pretty tough – you have to be equally ruthless about what you leave out and what you include. But the benefits are potentially great – if you can present an idea well in ten minutes you are pretty sure that you will have your viewer’s attention. As anybody who has lectured knows, with longer talks, no matter how strong your delivery, attention starts to wander for some in the audience as the talk progresses! So when I had the opportunity to do a talk of 10-18 minutes for Lancaster TEDx, I immediately went for the option of 10 minutes. It was a nice challenge for me and I thought that the brevity of the talk would help me to get my message across. So I beavered away for a few weeks putting things in and taking things out, thinking about key messages and marshalling my data: if my TEDx talk looks spontaneous …. it was not. In fact I imagine few of them really are, in spite of them being presented in such a way as to make it appear that they are. A lot of work goes into them – and that is just from the speakers. The crew who organized and filmed the event at Lancaster worked amazingly hard as well.

So was it worth it? Well, I have had many kind notes since I did the talk thanking me for it. I have also had a fair number of views of my talk on-line and many, many more likes than dislikes. So for me the answer is an emphatic ‘yes’, it was worth it. Many thanks to all who have viewed and publicised my talk.

Reading the comments has been an interesting experience – many are appreciative. Yet some simply show that some of the argument was ignored or not picked up by the watcher – so a watcher asks if religious identity is important to athletic performance in response to a point I make about the failure of the UK press to report on Mo Farrah’s Muslim identity. Though I thought I made it clear that that identity is one Farrah himself says is central to his athletic achievements and hence, yes, it is relevant, it seems that perhaps my optimism that a ten minute talk would deal with attention span issues was misplaced! For some of these mistaken queries other commenters set the record straight, which is kind of them.

Of slightly more interest are some of the questions that get thrown up – I will consider three here. Firstly: what about the term the West? I was glad this was picked up by a viewer as we discuss that in the book that my talk is based upon (Baker, Gabrielatos and McEnery, 2012:131-132). As a self-referential term it does have a role to play in setting up the ‘us’ that is opposed to the ‘them’ of the Muslim world. Another viewer asks whether Muslim world is just a neutral term used to define a culturally homogeneous region. This is a dangerous argument. It takes us to the precipice of the very ‘us and them’ distinction I was discussing. It is dangerous precisely because it is simplistic in nature, as it implies an homogeneous and distinct other (there are non-Muslims who live in the so-called Muslim world, for example – the area referred to is not homogeneous in oh so many ways). It also misses the point – if this was a simply neutral referring expression perhaps the ‘us and them’ distinction would not be so powerful. The problem is it is a very powerful term for generating an ‘us and them’ distinction because it sets Muslims in opposition to non-Muslims in the language and, as noted, it homogenizes Muslims  – they are all the same and the reporting of the views of the Muslim world entrench this monolithic view also (see Baker, Gabrielatos and McEnery, 2012:130). Finally, the same viewer wonders why I did not talk about the change of meaning of words over time. The answer to that one is easy – sadly, as shown in the later part of the talk, the attitudes I was talking about have not changed over time, even though I would have been happy to say that they had if this was true. The viewer also uses the word ‘gay’ as an interesting example of change in meaning over time – well, that would have been another talk to give. A lot of nonsense is spoken about this world – it is usually presented as a word that had a simple, innocent, meaning until another, less innocent meaning came along and spoilt it, a view hilariously lampooned by Stephen Fry and Hugh Laurie in this sketch:

However, this is not true – gay had far from innocent meanings in the past – a quick perusal of Jonathan Green’s excellent Chambers Slang Dictionary shows that. So yes, a discussion of word meaning change over time would have been interesting and debunking a few myths about the word gay would have been fun too – but that was not what my talk was about, so I shall leave the matter there. Maybe for a future TEDx? Who knows.

So – ten minute talks have their pluses and minuses. They are great for getting your message out and, by and large, I am happy with how my talk went. I found the experience of giving a TEDx talk a very positive one and many other people clearly enjoyed it also.  Best of all, it has made people think about and discuss their use of language, and that is something which always pleases me!

Watch my full TEDxLancasterU talk here:

Web of words: A short history of the troll

Over the past fortnight, various broadsheets and media outlets (see bibliography) picked up the story of my recent article, ‘“Uh…..not to be nitpicky,,,,,but…the past tense of drag is dragged, not drug.”: An overview of trolling strategies‘ (2013), which came out in the Journal of Language Aggression and Conflict. Of the thousands of comments collectively posted on those articles, one particularly interesting point that came through (out of many) was the general sense that there exists a single, fixed, canonical definition of the word troll which I ought to be using and had somehow missed.

So what is the definition of troll? In my thesis, I spent a rather lengthy 18,127 words trying to answer precisely this question, and very early on I realised that trying to discover, or, if one didn’t exist, to create a clean, robust, working definition that everyone would agree with would be close to impossible. There are at least three major problems, which for simplicity’s sake are best referred to as history, agreement, and change.

Continue reading

Beyond ‘auto-complete search forms’: Notes on the reaction to ‘Why do white people have thin lips?’

As Paul Baker reported yesterday, a paper that we co-authored entitled “‘Why do white people have thin lips?’ Google and the perpetuation of stereotypes via auto-complete search forms” (published 2013 in Critical Discourse Studies 10:2) has recently been garnering some media attention, being cited in the Mail Online and the 18 May 2013 print issue of The Daily Telegraph (image below). Our findings — that “the auto-complete search algorithm offered by the search tool Google can produce suggested terms which could be viewed as racist, sexist or homophobic” — come as a German court “said Google must ensure terms generated by auto-complete are not offensive or defamatory” (BBC News, 14 May 2013).  Similar, earlier, cases of (personal) libel and defamation were recalled by both Paul and me during the process of our investigation, but — serious as it may be — the thrust of this study was not the potential for damage to individuals, but rather to entire social groups. We found that:

“Certain identity groups were found to attract particular stereotypes or qualities. For example, Muslims and Jewish people were linked to questions about aspects of their appearance or behaviour, while white people were linked to questions about their sexual attitudes. Gay and black identities appeared to attract higher numbers of questions that were negatively stereotyping.”

The nature of Google auto-complete is such that the content presented appears because a relatively high number of previous users have typed these strings into the search box. We argue, then, that the appearance of such a high frequency of (largely negatively) stereotyping results indicates that “humans may have already shaped the Internet in their image, having taught stereotypes to search engines and even trained them to hastily present these as results of ‘top relevance’.” This finding has been somewhat misinterpreted by the press; the short title revealed in the URL for the Mail Online article and used in the top ticker — ‘Is Google making us RACIST?’ — actually reverses the agency in this process, as we have argued that, in fact, users may have made Google racist.

This ties in to the main suggestion that we make in the conclusion of the article, that “there should be a facility to flag certain auto-completion statements or questions as problematic”, much the same as the ‘down-votes’ utilised in the Google-owned and -operated site YouTube. The argument here being: if auto-complete results have been crowd-sourced from Google users, why not empower the same users to work as mass moderators?

The other main point in our conclusion section was that this was not (and could not have been) a reception study “in that we are unable to make generalisations about the effects on users of encountering unexpected auto-complete question forms in Google”, but that this was an area ripe for further research.

“Hall’s (1973) notion of dominant, oppositional and negotiated resistant readings indicates that audiences potentially have complex and varying reactions to a particular ‘text’. As noted earlier, we make no claim that people who see questions which contain negative social stereotypes will come to internalise such stereotypes. A similar-length (at least) paper to this one would be required to do justice to how individuals react to these question forms. And part of such a reception study would also involve examining the links to various websites which appear underneath the auto-completed questions. Do such links lead to pages which attempt to confirm or refute the stereotyping questions?”

In short, we had found that Google auto-complete did offer a high frequency of (largely negative) stereotyping questions, and did not offer a way for users to problematise these at the point of presentation. What we did not find was that “Google searches ‘boost prejudice’”, though we did hope to spark a discussion on the topic, and to indicate that the field is open for researchers willing to conduct reception studies.

Daily Telegraph 18.05.13

Nic Subtirelu, a PhD student in the Department of Applied Linguistics and ESL at Georgia State University, wrote an interesting blog post on his site Linguistic Pulse beginning to do just that. After following the links presented from a sample search of “why do black people have big lips”, he says:

“So what happens when you do type in these searches? Well if you’re genuinely interested in the question enough to actually read some of the first results you find, my own experience here suggests that what you’ll be exposed to are sources that would not be considered credible in academic communities (and whose scholarly merits may be questionable) but nonetheless contain information designed to answer the question honestly using scientific theories (in this case evolutionary biology) and which often also acknowledge the over-generalization of the original question or the ideological norm that the question assumes (that is the question assumes Africans have ‘big’ noses only because they are being implicitly compared to ‘normal’ European noses).”

Nic does come across some traces of pseudo-scientific, white supremacist discourse, and misogynistic ideologies in the websites linked by auto-suggestion, but summarizes that “While [Google auto-complete] clearly suggests we live in a world of stereotyping and particularly negative stereotyping in the case of historically oppressed groups, it may also indicate the potential for challenging these stereotypes” and enters his own suggestion for further work, urging that:

“people who generate content critical of racist, homophobic, or sexist ideologies should attempt to make that content searchable by popular questions like ‘Why do black people have big noses?’ as well as accessible to broad audiences so that audiences relying on these stereotypes can have them challenged.”