Coming this year: Corpora and Discourse Studies (Palgrave Advances in Language and Linguistics)

Three members of CASS have contributed chapters to a new volume in the Palgrave Advances in Language and Linguistics series. Corpora and Discourse Studies will be released later this year.


corpdiscThe growing availability of large collections of language texts has expanded our horizons for language analysis, enabling the swift analysis of millions of words of data, aided by computational methods. This edited collection of chapters contains examples of such contemporary research which uses corpus linguistics to carry out discourse analysis. The book takes an inclusive view of the meaning of discourse, covering different text-types or modes of language, including discourse as both social practice and as ideology or representation. Authors examine a range of spoken, written, multimodal and electronic corpora covering themes which include health, academic writing, social class, ethnicity, gender, television narrative, news, Early Modern English and political speech. The chapters showcase the variety of qualitative and quantitative tools and methods that this new generation of discourse analysts are combining together, offering a set of compelling models for future corpus-based research in discourse.

Table of Contents:

  1. Introduction; Paul Baker and Tony McEnery
  2. E-Language: Communication in the Digital Age; Dawn Knight
  3. Beyond Monomodal Spoken Corpora: Using a Field Tracker to Analyse Participants’ Speech at the British Art Show; Svenja Adolphs, Dawn Knight and Ronald Carter
  4. Corpus-assisted Multimodal Discourse Analysis of Television and Film Narratives; Monika Bednarek
  5. Analysing Discourse Markers in Spoken Corpora: Actually as a Case Study; Karin Aijmer
  6. Discursive Constructions of the Environment in American Presidential Speeches 1960-2013: A Diachronic Corpus-assisted Study; Cinzia Bevitori
  7. Health Communication and Corpus Linguistics: Using Corpus Tools to Analyse Eating Disorder Discourse Online; Daniel Hunt and Kevin Harvey
  8. Multi-Dimensional Analysis of Academic Discourse; Jack A. Hardy
  9. Thinking About the News: Thought Presentation in Early Modern English News Writing; Brian Walker and Dan McIntyre
  10. The Use of Corpus Analysis in a Multi-perspectival Study of Creative Practice; Darryl Hocking
  11. Corpus-assisted Comparative Case Studies of Representations of the Arab World; Alan Partington
  12.  Who Benefits When Discourse Gets Democratised? Analysing a Twitter Corpus Around the British Benefits Street Debate; Paul Baker and Tony McEnery
  13. Representations of Gender and Agency in the Harry Potter Series; Sally Hunt
  14. Filtering the Flood: Semantic Tagging as a Method of Identifying Salient Discourse Topics in a Large Corpus of Hurricane Katrina Reportage; Amanda Potts

Coming to CASS to code: The first two months

anthony_closeup_120px

After working at Waseda University in Japan for exactly 10 years, I was granted a one-year sabbatical in 2014 to concentrate on my corpus linguistics research. As my first choice of destination was Lancaster University, I was overjoyed to hear from Tony McEnery that the Centre for Corpus Approaches to Social Science (CASS) would be able to offer me office space and access to some of the best corpus resources in the world. I have now been at CASS for two months and thought this would be a good time to report on my experience here to date.

Since arriving at CASS, I have been working on several projects. My main project here is the development of a new database architecture that will allow AntConc, my freeware corpus analysis toolkit, to process very large corpora in a fast and resource-light way. The strong connection between the applied linguistics and computer science at Lancaster has allowed me to work closely with some excellent computer science faculty and graduate students, including Paul Rayson, John Mariani, Stephen Wattam, and John Vidler. We just presented our first results at LREC 2014 in Reykjavik.

I’ve also been working closely with the CASS members, including Amanda Potts and Robbie Love, to develop a set of ‘mini’ corpus tools to help with the collection, cleaning, and processing of corpora. I have now released VariAnt, which is a tool that finds spelling variants in a corpus, and SarAnt, which allows multiple search-and-replace functions to be carried out in a corpus as a batch process. I am also just about to release TagAnt, which will finally give corpus linguists a simple and intuitive interface to popular freeware Part-Of-Speech (POS) tagging tools such TreeTagger. I am hoping to develop more of these tools to help the corpus linguists in CASS and around the world to help with the complex and time-consuming tasks that they have to perform each day.

I always expected that I would enjoy the time at Lancaster, but did not anticipate that I would enjoy it as much as I am. Lancaster University has a great campus, the research facilities are some of the best in the world, the CASS members have treated me like family since the day I arrived, and even the weather has been kind to me, with sunny days throughout April and May. I look forward to writing more about my projects here at CASS.

How to be a PhD student (by someone who just was), Part 2: Managing your work and working relationships

After submitting and successfully defending my thesis a few months ago, I’ve decided to share some ‘lessons learnt’ over the course of my 38 months as a PhD student. 

In Part 2 of this series, I’ll talk about best practices for structuring your work, managing your relationship with your supervisor, and my experience with teaching undergraduates. If you missed “Part 1: Preparing for the programme”, you can read it here


hypothesis

Structuring your work

I believe it’s healthy to treat your PhD—as much as possible—like a job. Like any job, a PhD has physical, social, and temporal boundaries.

Try to create a PhD ‘space’. Make use of your office if you’ve been given one at your university, and create a space within your home that is a ‘work area’ if you haven’t been given one. Working from bed, from the sofa, or from a café means that your PhD is infiltrating all areas of your life. While some degree of this is inevitable, it’s best to keep physical boundaries as much as possible, even if you can only keep it to your desk.

By the same token, making friends outside of your department or your field is helpful in many ways. I adore my friends from Linguistics and I couldn’t have finished my doctorate without them, but you wouldn’t solely hang out with friends from work when you’re at home, and this is the same situation. In a group of people who have a similar background, you might end up talking about your field ‘outside of hours’. This can be stimulating, but also exhausting. You may want to vent about your department, or talk about something other than your PhD or field, even trashy TV! It’s easier with friends from other areas. As a nice extra feature, the connections that you make outside of your field can also help you inside your field. I’ve had very good advice from friends working in statistics, gotten ideas from historians, and been inspired by literary scholars, even though I might never venture into these areas in the library.

If you can, also create a routine for yourself, even if this isn’t 9-5. It’s best if this routine involves physically moving locations, but even if it doesn’t, physically change something: take a shower, get dressed for work. Pick 8 hours within the day that you work best, and work during those hours. Don’t be too hard on yourself if you have a short day or miss days out entirely…a PhD is ‘swings and roundabouts’ as they say around here…it’s long enough that you will make up the time to yourself. As much as possible, take the weekends and holidays off. This might mean working longer than 8 hours on weekdays, but personally, I think it’s worth it. Many people study in a place far from where they grew up, and a PhD is one time in life where you can be flexible enough with your time to enjoy a bit of sightseeing and tourism.

During this routine, set clear goals for yourself. I’ve seen people arguing for and against writing something every day. I found it very helpful to set a daily word count goal for myself, then sit in front of a computer until I at least came close. The number isn’t important: at the start of my PhD, I aimed to write 200 words per day; at the end of my PhD, I was able to write 1,000 words per day. What is important is getting into a routine. You will sit down some days and feel horrible. You’ll have writer’s block. You will struggle through each word of those 200, and know that you’ll delete most of them. But it’s much easier to get 40 great words out of 200 bad ones than to write 40 words completely cold. I’ve written entire chapters three times as long as they needed to be, and hated them. But paring them down is cathartic—it’s like sculpting. The bonus is that when you get into the habit of writing every day, you slowly get into the habit of writing something good every day. Soon, you’ll be writing 100 words and keeping 50 of them. Then you’ll be writing 1,000 words and keeping 900 of them. The important part is keeping the pace: just write! Your supervisor will also appreciate having something tangible to mark your progress (see next section).

As far as the structure of my own work, there are three things that I would do differently, if I could do it all again:

  1. Decide on a reference manager and stick to it diligently from Day 1. At the start of my degree I used EndNote for reference management, as this was offered for free by my university and came in both desktop and web versions. For my whole first year, I used EndNote to create an annotated bibliography—an extremely useful tool when drafting your literature review. However, EndNote began crashing on me, and papers were no longer available. In my second year, I stopped keeping track of references and just kept haphazard folders of PDFs. In my third year, I just used in-line citations, believing that sources would be easy to find later on. Not true! The month before submission I decided to make the leap to Mendeley, a truly amazing (free) reference manager that allows you to build and share libraries, store your PDFs, search other people’s collections, and select from a vast array of output styles (I favour APA 6th edition). The transition was extraordinarily painful. Exporting from Endnote was problematic and buggy, scanning PDFs in Mendeley was error-prone, and finding the corresponding works for those in-line references was impossible in some cases. I wasted a solid week just before submission sorting out my references, and this really should have been done all along. It would have been so painless!
  2. Master MS Word early on. In my final year, I finally got serious about standardising the numbering of my tables and figures, which means that in the eleventh hour, I was still panicking, trying to make sure that I had updated everything to the proper styles and made appropriate in-line references to my data. Had I set my styles earlier on and made the best use of MS Word’s quite intuitive counting and cross-referencing mechanisms, I would have saved myself days of close reading. If you are using MS Word (sorry, I can’t say anything about LaTeX) and you are not using the citation manager or cross-reference tool, learn how to do that immediately. Today. Your library might have a class on it, or, like me, you can brush up in an hour of web searching.
  3. Put down the books earlier. At a certain point, you need to generate new research and make a novel contribution to knowledge. Your first year and much of your second year will be dedicated to making sure that a research gap exists, and that you can pay tribute to all of the giants whose shoulders you will be standing on. However, burying yourself in a library for three years reading everyone else’s great works is a good way to paralyse yourself. Of course you will always need to keep up with the times, but a certain point, your rate of writing will overtake your rate of reading. If I could do it again, I would follow a pattern more like this:

readwrite

After the first year, you won’t be missing anything totally fundamental. After the second year, you won’t be missing anything peripheral. If, in the third year, you’ve missed something very fresh, your examiners will point it out. But the more important thing is to make a contribution. Most of the PhD is research, not literature review. Your supervisor will be able to help you with this, and with other things (but not really others), as I discuss below.

Managing your relationship with your supervisor

Continue reading

Introducing CASS 1+3 Research Student: Robbie Love

In 2013, the ESRC Centre for Corpus Approaches to Social Science was pleased to award its inaugural 1+3 (Masters to PhD) studentship to Robbie Love. Read a bit about the first year of his postgraduate experience, in Robbie’s own words below.


robbieloveI am a Research Student at CASS in the first year of a 1+3 PhD studentship. My main role is to investigate methodological issues in the collection of spoken corpora, but I also have interests in corpus-assisted critical discourse analysis.

I grew up in the north east of England in Blyth, Northumberland and Forest Hall in the outskirts of Newcastle. At school I found equal enjoyment in studying both English language and mathematics, but when deciding what to take at university I couldn’t think of something that would satisfy both, so I went with language.

I moved to Lancaster in 2010 to study my BA in English Language, which I soon converted to Linguistics. It was only in my third year that I was introduced to corpus linguistics, and became fascinated with its potential for revealing things about the way we communicate which I would never have predicted. I also liked its combination of quantitative and qualitative analysis, so it seemed like the perfect way to reengage with my enjoyment of maths. I had always been open to the idea of postgraduate study so when the opportunity came up to join CASS under the supervision of Tony McEnery it felt like the best thing for me to do.

Since joining CASS in the summer last year I have worked on several interesting projects including the changing language of gay rights opposition in Parliamentary debates (with Paul Baker), comments on online newspaper articles (with Amanda Potts), and the representation of Muslim people and Islam in the press reaction to the 2013 Woolwich incident (with Tony McEnery). I will be presenting findings on the Woolwich project at the upcoming Young Linguists’ Meeting in Poznań.

When I’m not playing with words on a computer, I am usually found rehearsing for a play or musical, playing my keyboard or eating any and all varieties of hummus.


For our People page for a full list of the centre’s investigators, researchers, and students.

Politeness and impoliteness in digital communication: Corpus-related explorations

Post-event review of the one-day workshop at Lancaster University

Topics don’t come much hotter than the forms of impoliteness or aggression that are associated with digital communication – flaming, trolling, cyberbullying, and so on. Yet academia has done surprisingly little to pull together experts in social interaction (especially (im)politeness) and experts in the new media, let alone experts in corpus-related work. That is, until last Friday, when the Corpus Approaches to Social Science Centre (@CorpusSocialSci) invited fifteen such people from diverse backgrounds (from law to psychology) gathered together for an intense one-day workshop.

CASS workshop cropped

The scope of the workshop was broad. One cannot very well study impoliteness without considering politeness, since merely failing to be polite in a particular context could be taken as impoliteness. Similarly, the range of digital communication types – email, blogs, texts, tweets and so on – presents a varied terrain to navigate. And then there are plenty of corpus-related approaches and notions, including collocation, keywords, word sketches, etc.

Andrew Kehoe (@ayjaykay), Ursula Lutzky (@UrsulaLutzky) and Matt Gee (@mattbgee) kicked off the day with a talk on swearwords and swearing, based on their 628-million-word Birmingham Blog Corpus. Amongst other things, they showed how internet swearword/profanity filters would work rather better if they incorporated notions like collocation. For example, knowing the words that typically accompany items like balls and tart can help disambiguate neutral usages (e.g. “tennis balls”, “lemon tart”) from less salubrious usages! (See more research from Andrew here, from Ursula here, and from Matt here.)

With Ruth Page’s (@ruthtweetpage) presentation, came a switch from blogs to Twitter. Using corpus-related techniques, Ruth revealed the characteristics of corporate tweets. Given that the word sorry turns out to be the seventh most characteristic or keyword for corporate tweets, it was not surprising that Ruth focused on apologies. She reveals that corporate tweets tend to avoid stating a problem or giving an explanation (thus avoiding damage to their reputation), but are accompanied by offers of repair and attempts to build – at least superficially – rapport. (See more research from Ruth here.)

Last of the morning was Caroline Tagg’s (@carotagg) presentation, and with this came another shift in medium, from Twitter to text messages. Focusing on convention and creativity, Caroline pointed out that, contrary to popular opinion, heavily abbreviated messages are not in fact the norm, and that when abbreviations do occur, they are often driven by communicative needs, e.g. using creativity to foster interest and engagement. Surveying the functions of texts, Caroline established that maintenance of friendship is key. And corpus-related techniques revealed the supporting evidence: politeness formulae were particularly frequent, including the salutation have a good one, the hedge a bit for the invitation, and for further contact, give us a bell. (See more research from Caroline here.)

With participants refuelled by lunch, Claire Hardaker (@DrClaireH) and I presented a smorgasbord of relevant issues. As an opening shot, we displayed frequencies showing that the stereotypical emblems of British politeness, words such as please, thank you, sorry, excuse me, can you X, tend not to be frequent in any digital media variety, relative to spoken conversation (as represented in the British National Corpus). Perhaps this accounts for why at least some sectors of the British public find digital media barren of politeness. This is not to say that politeness does not take place, but it seems to take place through different means – consider the list of politeness items derived by Caroline above. And there was an exception: sorry was the only item that occurred with greater frequency in some digital media. This, of course, nicely ties in with Ruth’s focus on apologies. The bulk of my and Clare’s presentation revolved around using corpus techniques to help establish: (1) definitions (e.g. what is trolling?), (2) strategies and formulae (e.g. what is the linguistic substance of trolling?) and (3) evaluations (e.g. what or who is considered rude?). Importantly, we showed that corpus-related approaches are not just lists of numbers, but can integrate qualitative analyses. (See more research from me here, and from Claire here.)

With encroaching presentation fatigue, the group decamped and went to at a computer lab. Paul Rayson (@perayson) introduced some corpus tools, notably WMatrix, of which he is the architect. Amanda Potts (@watchedpotts) then put everybody through their paces – gently of course! – giving everybody the opportunity of valuable hands-on experience.

Back in our discussion room and refreshed by various caffeinated beverages, we spent an hour reflecting on a range of issues. The conversation moved towards corpora that include annotations (interpretative information). Such annotations could be a way of helping to analyse images, context, etc., creating an incredibly rich dataset that could only be interrogated by computer (see here, for instance). I noted that this end of corpus work was not far removed from using Atlas or Nudist. Snapchat came up in discussion, not only because it involves images (though they can include text), but also because it raises issues of data accessibility (how do you get hold of a record of this communication, if one of its essential features is that it dissolves within a narrow timeframe?). The thorny problem of ethics was discussed (e.g. data being used in ways that were not signaled when original user agreements were completed).

Though exhausting, it was a hugely rewarding and enjoyable day. Often those rewards came in the form of vibrant contributions from each and every participant. Darren Reed, for example, pointed out that sometimes what we were dealing with is neither digital text nor digital image, but a digital act. Retweeting somebody, for example, could be taken as a “tweet act” with politeness implications.

CASS affiliated papers to be given at the upcoming 5th International Language in the Media Conference

In two weeks, several scholars affiliated with the Centre will be heading south to attend the 5th International Language in the Media Conference, taking place this year at Queen Mary, University of London. We are particularly excited about the theme — “Redefining journalism: Participation, practice, change” — as well as the conference’s continued prioritization of papers on “language and class, dis/ability, race/ethnicity, gender/sexuality and age; political discourse, commerce and global capitalism” (among other important themes). As a taster for those of you who will be joining us in London and an overview for those who are unfortunately unable to make it this year, abstracts of the CASS affiliated papers to be given at the conference are reproduced below.


“I hate that tranny look”: a corpus-based analysis of the representation of trans people in the national UK press

Paul Baker

In early 2013, two high-profile incidents involving press representation of trans people resulted in claims that the British press were transphobic. For example, Jane Fae wrote in The Independent, that ‘the trans community… is now a stand-in for various minorities… and a useful whipping girl for the national press… trans stories are only of interest when trans folk star as villains” (1/13/13). This paper examines Fae’s claims by using methods from corpus linguistics in order to identify the most frequent and salient representations of trans people in the national UK press. Corpus approaches use computational tools as an aid in human research, offering a good balance between quantitative and qualitative analyses, My analysis is based upon previous corpus-based research where I have examined the construction of gay people, refugees and asylum seekers and Muslims in similar contexts.

Using a 660,000 word corpus of news articles about trans people published in 2012, I employ concordancing techniques to examine collocates and discourse prosodies of terms like transgender, transsexual and tranny, in order to identify repetitive patterns of representation that occur across newspapers. I compare such patterns to sets of guidelines on language use by groups like The Beaumont Society, and discuss how certain representations can be enabled by the Press Complaints Commissions Code of Practice. While the analysis found that there are very different patterns of representation around the three labels under investigation, all of them showed a general preference for negative representations, with occasional glimpses of more positive journalism.


“I think we’d rather be called survivors”: A corpus-based critical discourse analysis of the semantic preferences of referential strategies in Hurricane Katrina news articles as indicators of ideology

Amanda Potts

In times of great crisis, people often rely upon the discourse of powerful institutions to help frame experiences and reinforce established ideologies (van Dijk 1985). Selection of referential strategies in such discourses can reveal much about our society; for instance, some words have the power to comfort addressees but further oppress the referents. Taking a corpus-based critical discourse analytical approach, in this paper I explore the discursive cues of underlying ideology (of both the publications and perhaps the assumed audience) with special attention on journalists’ referential and predicational strategies (Reisgl and Wodak 2000). Analysis is based on a custom-compiled 36.7-million-word corpus of American news print articles concerning Hurricane Katrina.

A variety of forms of reference have been identified in the corpus using part-of-speech tagged word lists. Collocates of each form of reference have been calculated and automatically assigned a semantic tag by the UCREL USAS tagger (Archer et al. 2002). Semantic categories represented by the highest proportion of collocates overall have been identified as the most salient indicators of ideology.

The semantic preferences of the referential strategies are found to be quite distinct. For instance, resident prefers the M: Movement semantic category, whereas collocates of evacuee tend to fall under N: Numbers. This may prime readers to interpret Gulf residents and evacuees as large, threatening, ‘invading’ masses (often in conjunction with negative water metaphors such as flood). The highest collocate semantic category for victim, displaced, and survivor is S: Social actions, states and processes, indicating that the [social] experiences of these referents—such as being helped or stranded, or linked to social identifies such as wife—are foregrounded rather than their numbers or movement.

Finally, the plummeting frequency of refugee following a unique debate in the media over the word’s meaning and even its semantic preference will also be discussed as an illustrative example of how unconscious language patterns can sometimes come to the fore in contested usage and influence the journalistic lexicon. Following from this, a more considered use of referential strategies is recommended, particularly in the media, where this could encourage heightened compassion for- and understanding of those gravely affected by catastrophic events.


Journalism through the Guardian’s goggles

Anna Marchi

‘Journalism is an intensely reflexive occupation, which constantly talks to and about itself’ (Aldridge and Evetts 2003: 560). Journalists create interpretative communities (Zelizer 2004) through the discourses they circulate about their profession, the meaning and role of journalism are constituted through daily performance (Matheson 2003) and can be studied by means of the self-reflexive traces in texts. That is, they can be detected and studied in a newspaper corpus.

This paper proposes a corpus-assisted discourse analysis (Partington 2009) of the ways journalists represent their trade in their own news-work. The focus of the research in one newspaper in particular: the Guardian. Previous research (Marchi and Taylor 2009) suggested that among British broadsheets the Guardian is by far the most interested in other media, as well as the most inclined to talk about itself. Using newspaper data from 2005, a particularly relevant year in the newspaper’s biography (it changed format from traditional broadsheet to berliner) and rich with self-reflexivity, I examine the discursive behavior of media-related lexical items in the corpus (such as journalist, reporter, hack, media, newspaper, press, tabloid) exploring the ways in which the Guardian conceptualises the role of the news media, how it represents professional values and the divide between good and bad journalism, and, ultimately, how it constructs its own identity. The study relies on the typical tools of corpus linguistics research – collocation analysis, keywords analysis, concordance analysis – and aims to a comprehensive description of the data, following the principle of total accountability (McEnery and Hardie 2012: 17), while keeping track of the broader extralinguistic context. From a methodological point of view this work encourages interdisciplinary contamination and a serendipitous approach to the data and wishes to offer an example of how corpus-based research can contribute to the academic investigation of journalism across disciplines.


Visit the conference website for more details, including a list of plenary speakers.

Beyond ‘auto-complete search forms’: Notes on the reaction to ‘Why do white people have thin lips?’

As Paul Baker reported yesterday, a paper that we co-authored entitled “‘Why do white people have thin lips?’ Google and the perpetuation of stereotypes via auto-complete search forms” (published 2013 in Critical Discourse Studies 10:2) has recently been garnering some media attention, being cited in the Mail Online and the 18 May 2013 print issue of The Daily Telegraph (image below). Our findings — that “the auto-complete search algorithm offered by the search tool Google can produce suggested terms which could be viewed as racist, sexist or homophobic” — come as a German court “said Google must ensure terms generated by auto-complete are not offensive or defamatory” (BBC News, 14 May 2013).  Similar, earlier, cases of (personal) libel and defamation were recalled by both Paul and me during the process of our investigation, but — serious as it may be — the thrust of this study was not the potential for damage to individuals, but rather to entire social groups. We found that:

“Certain identity groups were found to attract particular stereotypes or qualities. For example, Muslims and Jewish people were linked to questions about aspects of their appearance or behaviour, while white people were linked to questions about their sexual attitudes. Gay and black identities appeared to attract higher numbers of questions that were negatively stereotyping.”

The nature of Google auto-complete is such that the content presented appears because a relatively high number of previous users have typed these strings into the search box. We argue, then, that the appearance of such a high frequency of (largely negatively) stereotyping results indicates that “humans may have already shaped the Internet in their image, having taught stereotypes to search engines and even trained them to hastily present these as results of ‘top relevance’.” This finding has been somewhat misinterpreted by the press; the short title revealed in the URL for the Mail Online article and used in the top ticker — ‘Is Google making us RACIST?’ — actually reverses the agency in this process, as we have argued that, in fact, users may have made Google racist.

This ties in to the main suggestion that we make in the conclusion of the article, that “there should be a facility to flag certain auto-completion statements or questions as problematic”, much the same as the ‘down-votes’ utilised in the Google-owned and -operated site YouTube. The argument here being: if auto-complete results have been crowd-sourced from Google users, why not empower the same users to work as mass moderators?

The other main point in our conclusion section was that this was not (and could not have been) a reception study “in that we are unable to make generalisations about the effects on users of encountering unexpected auto-complete question forms in Google”, but that this was an area ripe for further research.

“Hall’s (1973) notion of dominant, oppositional and negotiated resistant readings indicates that audiences potentially have complex and varying reactions to a particular ‘text’. As noted earlier, we make no claim that people who see questions which contain negative social stereotypes will come to internalise such stereotypes. A similar-length (at least) paper to this one would be required to do justice to how individuals react to these question forms. And part of such a reception study would also involve examining the links to various websites which appear underneath the auto-completed questions. Do such links lead to pages which attempt to confirm or refute the stereotyping questions?”

In short, we had found that Google auto-complete did offer a high frequency of (largely negative) stereotyping questions, and did not offer a way for users to problematise these at the point of presentation. What we did not find was that “Google searches ‘boost prejudice'”, though we did hope to spark a discussion on the topic, and to indicate that the field is open for researchers willing to conduct reception studies.

Daily Telegraph 18.05.13

Nic Subtirelu, a PhD student in the Department of Applied Linguistics and ESL at Georgia State University, wrote an interesting blog post on his site Linguistic Pulse beginning to do just that. After following the links presented from a sample search of “why do black people have big lips”, he says:

“So what happens when you do type in these searches? Well if you’re genuinely interested in the question enough to actually read some of the first results you find, my own experience here suggests that what you’ll be exposed to are sources that would not be considered credible in academic communities (and whose scholarly merits may be questionable) but nonetheless contain information designed to answer the question honestly using scientific theories (in this case evolutionary biology) and which often also acknowledge the over-generalization of the original question or the ideological norm that the question assumes (that is the question assumes Africans have ‘big’ noses only because they are being implicitly compared to ‘normal’ European noses).”

Nic does come across some traces of pseudo-scientific, white supremacist discourse, and misogynistic ideologies in the websites linked by auto-suggestion, but summarizes that “While [Google auto-complete] clearly suggests we live in a world of stereotyping and particularly negative stereotyping in the case of historically oppressed groups, it may also indicate the potential for challenging these stereotypes” and enters his own suggestion for further work, urging that:

“people who generate content critical of racist, homophobic, or sexist ideologies should attempt to make that content searchable by popular questions like ‘Why do black people have big noses?’ as well as accessible to broad audiences so that audiences relying on these stereotypes can have them challenged.”

In the press: Google and the perpetuation of stereotypes

The findings of a paper published by myself and Amanda Potts on the implications of Google’s auto-complete search function have been reported in Mail Online and The Telegraph (18 May 2013).

The paper examined what happens when the beginnings of questions about different identity groups are entered into Google’s search form. For example, typing “why do black”, “do gay people” and “should jews” results in Google offering auto-complete suggestions which could be considered offensive or perpetuating stereotypes.

whydoblack

The paper’s aims were to raise questions about the appropriacy of such auto-completes but also to investigate which sorts of stereotyping questions tend to be associated with different identities. We categorised 2,690 such questions as they occurred across 12 social groups, finding that the groups with the most negative stereotypes associated with them were male, black and gay people.

shouldjews

Our paper does not argue that people reading such questions will automatically internalise such stereotypes (although younger or uncritical users of Google may do so, and people who hold those stereotypes may feel that they are validated) but we believe that there should be an option for certain suggestions to be flagged as offensive and removed or hidden if they reach a certain level of complaints, similar to YouTube’s commenting system.

dogaypeople

Baker, P. & Potts, A. (2013) “Why do white people have thin lips?”: Google and the perpetuation of stereotypes via auto-complete search forms. Critical Discourse Studies. 10:2, p. 187-204.

Available here.