Post-event review of the one-day workshop at Lancaster University
Topics donโt come much hotter than the forms of impoliteness or aggression that are associated with digital communication โ flaming, trolling, cyberbullying, and so on. Yet academia has done surprisingly little to pull together experts in social interaction (especially (im)politeness) and experts in the new media, let alone experts in corpus-related work. That is, until last Friday, when the Corpus Approaches to Social Science Centre (@CorpusSocialSci) invited fifteen such people from diverse backgrounds (from law to psychology) gathered together for an intense one-day workshop.
The scope of the workshop was broad. One cannot very well study impoliteness without considering politeness, since merely failing to be polite in a particular context could be taken as impoliteness. Similarly, the range of digital communication types โ email, blogs, texts, tweets and so on โ presents a varied terrain to navigate. And then there are plenty of corpus-related approaches and notions, including collocation, keywords, word sketches, etc.
Andrew Kehoe (@ayjaykay), Ursula Lutzky (@UrsulaLutzky) and Matt Gee (@mattbgee) kicked off the day with a talk on swearwords and swearing, based on their 628-million-word Birmingham Blog Corpus. Amongst other things, they showed how internet swearword/profanity filters would work rather better if they incorporated notions like collocation. For example, knowing the words that typically accompany items like balls and tart can help disambiguate neutral usages (e.g. โtennis ballsโ, โlemon tartโ) from less salubrious usages! (See more research from Andrew here, from Ursula here, and from Matt here.)
With Ruth Pageโs (@ruthtweetpage) presentation, came a switch from blogs to Twitter. Using corpus-related techniques, Ruth revealed the characteristics of corporate tweets. Given that the word sorry turns out to be the seventh most characteristic or keyword for corporate tweets, it was not surprising that Ruth focused on apologies. She reveals that corporate tweets tend to avoid stating a problem or giving an explanation (thus avoiding damage to their reputation), but are accompanied by offers of repair and attempts to build โ at least superficially โ rapport. (See more research from Ruth here.)
Last of the morning was Caroline Taggโs (@carotagg) presentation, and with this came another shift in medium, from Twitter to text messages. Focusing on convention and creativity, Caroline pointed out that, contrary to popular opinion, heavily abbreviated messages are not in fact the norm, and that when abbreviations do occur, they are often driven by communicative needs, e.g. using creativity to foster interest and engagement. Surveying the functions of texts, Caroline established that maintenance of friendship is key. And corpus-related techniques revealed the supporting evidence: politeness formulae were particularly frequent, including the salutation have a good one, the hedge a bit for the invitation, and for further contact, give us a bell. (See more research from Caroline here.)
With participants refuelled by lunch, Claire Hardaker (@DrClaireH) andย Iย presented a smorgasbord of relevant issues. As an opening shot, we displayed frequencies showing that the stereotypical emblems of British politeness, words such as please, thank you, sorry, excuse me, can you X, tend not to be frequent in any digital media variety, relative to spoken conversation (as represented in the British National Corpus). Perhaps this accounts for why at least some sectors of the British public find digital media barren of politeness. This is not to say that politeness does not take place, but it seems to take place through different means โ consider the list of politeness items derived by Caroline above. And there was an exception: sorry was the only item that occurred with greater frequency in some digital media. This, of course, nicely ties in with Ruthโs focus on apologies. The bulk of my and Clareโs presentation revolved around using corpus techniques to help establish: (1) definitions (e.g. what is trolling?), (2) strategies and formulae (e.g. what is the linguistic substance of trolling?) and (3) evaluations (e.g. what or who is considered rude?). Importantly, we showed that corpus-related approaches are not just lists of numbers, but can integrate qualitative analyses. (See more research from meย here, and from Claire here.)
With encroaching presentation fatigue, the group decamped and went to at a computer lab. Paul Rayson (@perayson) introduced some corpus tools, notably WMatrix, of which he is the architect. Amanda Potts (@watchedpotts) then put everybody through their paces โ gently of course! โ giving everybody the opportunity of valuable hands-on experience.
Back in our discussion room and refreshed by various caffeinated beverages, we spent an hour reflecting on a range of issues. The conversation moved towards corpora that include annotations (interpretative information). Such annotations could be a way of helping to analyse images, context, etc., creating an incredibly rich dataset that could only be interrogated by computer (see here, for instance). I noted that this end of corpus work was not far removed from using Atlas or Nudist. Snapchat came up in discussion, not only because it involves images (though they can include text), but also because it raises issues of data accessibility (how do you get hold of a record of this communication, if one of its essential features is that it dissolves within a narrow timeframe?). The thorny problem of ethics was discussed (e.g. data being used in ways that were not signaled when original user agreements were completed).
Though exhausting, it was a hugely rewarding and enjoyable day. Often those rewards came in the form of vibrant contributions from each and every participant. Darren Reed, for example, pointed out that sometimes what we were dealing with is neither digital text nor digital image, but a digital act. Retweeting somebody, for example, could be taken as a โtweet actโ with politeness implications.