Reflections from the CASS student challenge panel member, part 1

Each year, one student from an outside institution is appointed to ‘challenge‘ CASS with concepts from their own novel research. Pamela Irwin, the 2013/2014 student challenge panel member, is beginning to wrap up her ‘term’, and has put together a series of reflections on the process. Read the first entry below.


I am a mature student with a background in health and higher education, and currently completing my PhD in gerontology. My research centres on the interaction between age, gender and the community in the context of resilience in older women living on their own in rural Australia.

Although ageing is informed by many disciplines, my research route is via the broad domain of social sciences. Serendipitously, a peer review of a journal article was responsible for my formal exposure to linguistics and corpus linguistics. The reviewers indicated that my paper reflected a sociological rather than the requisite social psychology orientation, and while I was aware that my topic crossed these disciplines, I was not fully cognisant of the critical importance of language in differentiating these subtleties. As a result, I enrolled in a corpus linguistic programme designed to improve academic language use, and through the inaugural CASS summer school, I was then able to consolidate, expand and apply this knowledge. This immersion in the world of linguistics stimulated a new and growing interest in the ‘function’ of language in academia and everyday life.

However I soon realised that my grounding in the grammatical structures of the English language was extremely basic. While I could identify the fundamental parts of speech, I could not parse a sentence and any further analysis was well beyond my skill set. Since then, I have been introduced to new concepts (semiosis), terminology (concatenate), techniques (linguistic ‘friendly’ transcribing) and technology (WMatix) amongst others, as well as being challenged to rethink and change some of my preconceived ideas (metaphor).

Here, my understanding of the figures of speech is particularly salient. Resilience, a key theme in my research, tends to have different meanings depending on both the subject and context. An overview of the literature suggests that resilience is often described metaphorically as ‘bouncing back’ in academic and popular psychology, whereas in an Australian setting, resilience is more likely to be associated with an image of ‘the (little) Aussie battler’ (Moore, 2010). In this context, resilience represents perseverance, with the ‘underdog’ battling against all odds to overcome hardship in adverse conditions. By contrast, at a systems (socio-ecological) level, resilience is not yet related to a specific metaphor or image. It is however, closely linked to a related term, ‘panarchy’, that involves a dynamic process of adaptation and transformation.

Thus resilience is defined by a metaphor (a ball), an image (a battler) and a conceptual term (panarchy) in my study. These differences provide a rich ‘landscape’ to uncover with corpus linguistics.

Reference:

Moore, B. (2010). What’s their story? A history of Australian words. Melbourne: Oxford University Press.


Return soon to read Pamela’s next installment! Are you interested in becoming the next student challenge panel member? Apply to attend our free summer school to learn more.

Politeness and impoliteness in digital communication: Corpus-related explorations

Post-event review of the one-day workshop at Lancaster University

Topics don’t come much hotter than the forms of impoliteness or aggression that are associated with digital communication – flaming, trolling, cyberbullying, and so on. Yet academia has done surprisingly little to pull together experts in social interaction (especially (im)politeness) and experts in the new media, let alone experts in corpus-related work. That is, until last Friday, when the Corpus Approaches to Social Science Centre (@CorpusSocialSci) invited fifteen such people from diverse backgrounds (from law to psychology) gathered together for an intense one-day workshop.

CASS workshop cropped

The scope of the workshop was broad. One cannot very well study impoliteness without considering politeness, since merely failing to be polite in a particular context could be taken as impoliteness. Similarly, the range of digital communication types – email, blogs, texts, tweets and so on – presents a varied terrain to navigate. And then there are plenty of corpus-related approaches and notions, including collocation, keywords, word sketches, etc.

Andrew Kehoe (@ayjaykay), Ursula Lutzky (@UrsulaLutzky) and Matt Gee (@mattbgee) kicked off the day with a talk on swearwords and swearing, based on their 628-million-word Birmingham Blog Corpus. Amongst other things, they showed how internet swearword/profanity filters would work rather better if they incorporated notions like collocation. For example, knowing the words that typically accompany items like balls and tart can help disambiguate neutral usages (e.g. “tennis balls”, “lemon tart”) from less salubrious usages! (See more research from Andrew here, from Ursula here, and from Matt here.)

With Ruth Page’s (@ruthtweetpage) presentation, came a switch from blogs to Twitter. Using corpus-related techniques, Ruth revealed the characteristics of corporate tweets. Given that the word sorry turns out to be the seventh most characteristic or keyword for corporate tweets, it was not surprising that Ruth focused on apologies. She reveals that corporate tweets tend to avoid stating a problem or giving an explanation (thus avoiding damage to their reputation), but are accompanied by offers of repair and attempts to build – at least superficially – rapport. (See more research from Ruth here.)

Last of the morning was Caroline Tagg’s (@carotagg) presentation, and with this came another shift in medium, from Twitter to text messages. Focusing on convention and creativity, Caroline pointed out that, contrary to popular opinion, heavily abbreviated messages are not in fact the norm, and that when abbreviations do occur, they are often driven by communicative needs, e.g. using creativity to foster interest and engagement. Surveying the functions of texts, Caroline established that maintenance of friendship is key. And corpus-related techniques revealed the supporting evidence: politeness formulae were particularly frequent, including the salutation have a good one, the hedge a bit for the invitation, and for further contact, give us a bell. (See more research from Caroline here.)

With participants refuelled by lunch, Claire Hardaker (@DrClaireH) and I presented a smorgasbord of relevant issues. As an opening shot, we displayed frequencies showing that the stereotypical emblems of British politeness, words such as please, thank you, sorry, excuse me, can you X, tend not to be frequent in any digital media variety, relative to spoken conversation (as represented in the British National Corpus). Perhaps this accounts for why at least some sectors of the British public find digital media barren of politeness. This is not to say that politeness does not take place, but it seems to take place through different means – consider the list of politeness items derived by Caroline above. And there was an exception: sorry was the only item that occurred with greater frequency in some digital media. This, of course, nicely ties in with Ruth’s focus on apologies. The bulk of my and Clare’s presentation revolved around using corpus techniques to help establish: (1) definitions (e.g. what is trolling?), (2) strategies and formulae (e.g. what is the linguistic substance of trolling?) and (3) evaluations (e.g. what or who is considered rude?). Importantly, we showed that corpus-related approaches are not just lists of numbers, but can integrate qualitative analyses. (See more research from me here, and from Claire here.)

With encroaching presentation fatigue, the group decamped and went to at a computer lab. Paul Rayson (@perayson) introduced some corpus tools, notably WMatrix, of which he is the architect. Amanda Potts (@watchedpotts) then put everybody through their paces – gently of course! – giving everybody the opportunity of valuable hands-on experience.

Back in our discussion room and refreshed by various caffeinated beverages, we spent an hour reflecting on a range of issues. The conversation moved towards corpora that include annotations (interpretative information). Such annotations could be a way of helping to analyse images, context, etc., creating an incredibly rich dataset that could only be interrogated by computer (see here, for instance). I noted that this end of corpus work was not far removed from using Atlas or Nudist. Snapchat came up in discussion, not only because it involves images (though they can include text), but also because it raises issues of data accessibility (how do you get hold of a record of this communication, if one of its essential features is that it dissolves within a narrow timeframe?). The thorny problem of ethics was discussed (e.g. data being used in ways that were not signaled when original user agreements were completed).

Though exhausting, it was a hugely rewarding and enjoyable day. Often those rewards came in the form of vibrant contributions from each and every participant. Darren Reed, for example, pointed out that sometimes what we were dealing with is neither digital text nor digital image, but a digital act. Retweeting somebody, for example, could be taken as a “tweet act” with politeness implications.

Visiting With The Brown Family

In 2011 I gave a plenary talk on how American English is changing over time (contrasting it with British English), using the Brown Family of corpora. Each member of the Brown family consists of a corpus of 1 million words of written, published, standard English, divided into 500 files each of about 2000 words each. Fifteen genres of writing are represented – this framework being created decades ago when the original Brown corpus was compiled by Henry Kučera and W. Nelson Francis at Brown University, having the distinction of being the first publically available corpus ever built. Containing only American texts published in 1961, it originally went by the name of A Standard Corpus of Present-Day Edited American English for use with Digital Computers but later became known as just the Brown Corpus. It was followed by an equivalent British version, with later members representing English from the 1990s, the 2000s and the 1930s. A 1901 British version is in the pipeline.

Before I gave my talk, however, Mark Davies gave a brilliant presentation on the COHA (Corpus of Historical American English) which has 400 million words and covers the period from 1800 to the present day. It was the proverbial hard act to follow. Compared to the COHA, the Brown family are tiny, and the coverage occurs across 30 or 15 year snapshots, rather than representing every year. If we identify, say, that the word Mr is less frequent in 2006 than in 1991 then it is tempting to say that Mr is becoming less frequent over time. But we don’t know for certain what corpora from all the years in between would tell us. Having multiple sampling points presents a more convincing picture, but judicious hedging must be applied.

Also, being small, many words in the Brown family have tiny frequencies so it’s very difficult to make any claims about them. And the sampling could be viewed as rather outdated – the sorts of texts that people accessed in the 1960s are not necessarily the same as they access now. There are no online texts in the Brown family (although to ease collection, both the 2006 members involved texts that were originally published in written form, then placed online). Nor is there any advertising text. Or song lyrics. Or horror fiction. Or erotica (although there is a section on Romantic Fiction which could be pushed in that direction). Finally, the fact that all the texts are of the published variety means that they tend to represent a somewhat standardised, conservative form of English. A lot of the innovation in English happens in much more informal contexts, especially where young people or people from different backgrounds mix together – inner-city playgrounds and internet forums being two good examples. By the time such innovation gets into written published standard English, it’s no longer innovative. So the Brown family can’t tell us about the cutting edge of language use – they’ll always be a few years out of fashion.

So what are the Brown family good for, if anything?

Continue reading