2014/15 in retrospective: Perspectives on Chinese

Looking back over the academic year as it draws to a close, one of the highlights for us here at CASS was the one-day seminar we hosted in January on Perspectives on Chinese: Talks in Honour of Richard Xiao. This event celebrated the contributions to linguistics of CASS co-investigator Dr. Richard Zhonghua Xiao, on the occasion of both his retirement in October 2014 (and simultaneous taking-up of an honorary position with the University!), and the completion of the two funded research projects which Richard has led under the aegis of CASS.

The speakers included present and former collaborators with Richard – some (including myself) from here at Lancaster, others from around the world – as well as other eminent scholars working in the areas that Richard has made his own: Chinese corpus linguistics (especially, but not only, comparative work), and the allied area of the methodologies that Richard’s work has both utilised and promulgated.

In the first presentation, Prof. Hongyin Tao of UCLA took a classic observation of corpus-based studies – the existence, and frequent occurrence, of highly predictable strings or structures, pointed out a little-noticed aspect of these highly-predictable elements. They often involve lacunae, or null elements, where some key component of the meaning is simply left unstated and assumed. An example of this is the English expression under the influence, where “the influence of what?” is often implicit, but understood to be drugs/alcohol. It was pointed out that collocation patterns may identify the null elements, but that a simplistic application of collocation analysis may fail to yield useful results for expressions containing null elements. Finally, an extension of the analysis to yinxiang, the Chinese equivalent of influence, showed much the same tendencies – including, crucially, the importance of null elements – at work.

The following presentation came from Prof. Gu Yueguo of the Chinese Academy of Social Sciences. Gu is well-known in the field of corpus linguistics for his many projects over the years to develop not just new corpora, but also new types of corpus resources – for example, his exciting development in recent years of novel types of ontology. His presentation at the seminar was very much in this tradition, arguing for a novel type of multimodal corpus for use in the study of child language acquisition.

At this point in proceedings, I was deeply honoured to give my own presentation. One of Richard’s recently-concluded projects involved the application of Douglas Biber’s method of Multidimensional Analysis to translational English as the “Third Code”. In my talk, I presented methodological work which, together with Xianyao Hu, I have recently undertaken to assist this kind of analysis by embedding tools for the MD approach in CQPweb. A shorter version of this talk was subsequently presented at the ICAME conference in Trier at the end of May.

Prof. Xu Hai of Guangdong University of Foreign Studies gave a presentation on the study of the study of Learner Chinese, an issue which was prominent among Richard’s concerns as director of the Lancaster University Confucius Institute. As noted above, Richard has led a project funded by the British Academy, looking at the acquisition of Mandarin Chinese as a foreign language; as a partner on that project, Xu’s presentation of a preliminary report on the Guangwai Lancaster Chinese Learner Corpus was timely indeed. This new learner corpus – already in excess of a million words in size, and consisting of a roughly 60-40 split between written and spoken materials – follows the tradition of the best learner corpora for English by sampling learners with many different national backgrounds, but also, interestingly, includes some longitudinal data. Once complete, the value of this resource for the study of L2 Chinese interlanguage will be incalculable.

The next presentation was another from colleagues of Richard here at Lancaster: Dr. Paul Rayson and Dr. Scott Piao gave a talk on the extension of the UCREL Semantic Analysis System (USAS) to Chinese. This has been accomplished by means of mapping the vast semantic lexicon originally created for English across to Chinese, initially by automatic matching, and secondarily by manual editing. Scott and Paul, with other colleagues including CASS’s Carmen Dayrell, went on to present this work – along with work on other languages – at the prestigious NAACL HLT 2015 conference, in whose proceedings a write-up has been published.

Prof. Jiajin Xu (Beijing Foreign Studies University) then made a presentation on corpus construction for Chinese. This area has, of, course, been a major locus of activity by Richard over the years: his Lancaster Corpus of Mandarin Chinese (LCMC), a Mandarin match for the Brown corpus family, is one of the best openly-available linguistic resources for that language, and his ZJU Corpus of Translational Chinese (ZCTC) was a key contribution of his research on translation in Chinese . Xu’s talk presented a range of current work building on that foundation, especially the ToRCH (“Texts of Recent Chinese”) family of corpora – a planned Brown-family-style diachronic sequence of snapshot corpora in Chinese from BFSU, starting with the ToRCH2009 edition. Xu rounded out the talk with some case studies of applications for ToRCH, looking first at recent lexical change in Chinese by comparing ToRCH2009 and LCMC, and then at features of translated language in Chinese by comparing ToRCH2009 and ZCTC.

The last presentation of the day was from Dr. Vittorio Tantucci, who has recently completed his PhD at the department of Linguistics and English Language at Lancaster, and who specialises in a number of issues in cognitive linguistic analysis including intersubjectivity and evidentiality. His talk addressed specifically the Mandarin evidential marker 过 guo, and the path it took from a verb meaning ‘to get through, to pass by’ to becoming a verbal grammatical element. He argued that this exemplified a path for an evidential marker to originate from a traversative structure – a phenomenon not noted on the literature on this kind of grammaticalisation, which focuses on two other paths of development, from verbal constructions conveying a result or a completion. Vittorio’s work is extremely valuable, not only in its own right but as a demonstration of the role that corpus-based analysis, and cross-linguistic evidence, has to play on linguistic theory. Given Richard’s own work on the grammar and semantics of aspect in Chinese, a celebration of Richard’s career would not have been complete without an illustration of how this trend in current linguistics continues to develop.

All in all, the event was a magnificent tribute to Richard and his highly productive research career, and a potent reminder of how diverse his contributions to the field have actually been, and of their far-reaching impact among practitioners of Chinese corpus linguistics. The large and lively audience certainly seemed to agree with our assessment!

Our deep thanks go out to all the invited speakers, especially those who travelled long distances to attend – our speaker roster stretched from California in the west, to China in the east.

CASS Corpus Linguistics workshop at the University of Caxias do Sul (UCS, Brazil)

Last month at UCS (Brazil), the CASS Corpus Linguistics workshop found a receptive audience who participated actively and enthusiastically engaged in the discussion. The workshop was run from 27-28 May by CASS members Elena Semino, Vaclav Brezina and Carmen Dayrell, and perfectly organised by the local committee Heloísa Feltes and Ana Pelosi.

Organizers

From left to right: Carmen Dayrell, Heloísa Feltes, Vaclav Brezina, Elena Semino, and Ana Pelosi

This workshop brought together lecturers, researchers, PhDs and MA research students from various Brazilian universities. It was a positive, invigorating experience for the CASS team and a golden opportunity to discuss the various applications of corpus linguistics methods. We would like to thank UCS for offering all necessary conditions to make this workshop run so smoothly.

The workshop was part of a collaborative project between UK and Brazilian scholars funded by the UK’s ESRC and the Brazilian research agency CONFAP (FAPERGS) which will make use of corpus linguistics techniques to investigate the linguistic representation of urban violence in Brazil. Further details of this project can be found at http://cass.lancs.ac.uk/?page_id=1501.

New CASS Briefing now available — Hate Speech: Crime against Muslims

CASSbriefings-hatespeechHate Speech: Crime against Muslims. The notion of ‘hate crime’ might conjure up an image of premeditated violence perpetrated by a bigoted thug. But in reality, a majority of so-called ‘hate crimes’ are committed with little aforethought by very ordinary people in ordinary circumstances and involve a verbal assault rather than physical attack. This briefing provides the key research findings from the project as it provided important groundwork for a CASS research project launched in 2014 on The management of hateful invective by the courts.


New resources are being added regularly to the new CASS: Briefings tab above, so check back soon.

The heart of the matter …

TLC-LogoHow wonderful it is to get to the inner workings of the creature you helped bring to life! I’ve just spent a week with the wonderful – and superbly helpful – team at CASS devoting time to matters on the Trinity Lancaster Spoken Corpus.

Normally I work from London situated in the very 21st century environment of the web – I plan, discuss and investigate the corpus across the ether with my colleagues in Lancaster. They regularly visit us with updates but the whole ‘system’ – our raison d’etre if you like – sits inside a computer. This, of course, does make for very modern research and allows a much wider circle of access and collaboration. But there is nothing like sitting in the same room as colleagues, especially over the period of a few days, to test ideas, to leap connections and to get the neural pathways really firing.

vaclavdana

It’s been a stimulating week not least because we started with the wonderful GraphColl, a new collocation tool which allows the corpus to come to life before our eyes. As the ‘bubbles’ of lexis chase across the screen searching for their partners, they pulse and bounce. Touching one of them lights up more collocations, revealing the mystery of communication. Getting the number right turns out to be critical in producing meaningful data that we can actually read – too loose and we end up with a density we cannot untangle; the less the better seems to be the key.  It did occur to me that finally language had produced something that could contribute to the Science Picture Library https://www.sciencephoto.com/ where GraphColl images could complement the shots of language activity in the brain. I’ve been experimenting with it this week – digging out question words from part of the corpus to find out how patterned they are – more to come.

We’ve also been able to put more flesh on the bones of an important project developed by Vaclav Brezina – how to make the corpus meaningful for teachers (and students). Although we live in an era where the public benefit of science is rightly foregrounded, it can be hard sometimes to ‘translate’ the science and complexity of the supporting technology so that it is of real value to the very people who created the corpus. Vaclav has been preparing a series of extracts of corpus data that can come full circle back into the classroom by showing teachers and their students the way that language works – not in the textbooks but in real ‘lingua franca’ life. In other words, demonstrating the language that successful learners use to communicate in global contexts. This is going to be turned into a series of teaching materials with the quality and relevance being assured by crowdsourcing teaching activities from the teachers themselves.

time Collocates of time in the GESE interactive task

Meanwhile I am impressed by how far the corpus – this big data – is able to support Trinity by helping to build robust validity arguments for the GESE test.  This is critical in helping Trinity’s core audience – our test takers –  to understand why should I do this test, what will the test demonstrate, what effect will it have on my learning, is it fair?  All in all a very productive week.

CASS PhD student in Moscow to attend the XVI April International Academic Conference on Economic and Social Development

I recently got the opportunity to travel to Moscow to attend the XVI April International Academic Conference on Economic and Social Development at the National Research University – Higher School of Economics (HSE). This conference covered a wide variety of fields including Sociology, Geography, and Technology, and, on the last day of the conference, there was a seminar specifically for Linguistics PhD students. The aim of this seminar was to allow students from Russia and other countries to exchange ideas, and to introduce students from around the world to HSE.

At the seminar, there were presentations from 10 PhD students and these covered a variety of Linguistics topics including Grammar, Semantics, Sign Language, and Cognitive Linguistics. There were also some presentations on Corpus Linguistics: one which discussed semantic role labelling for the Russian language based on the Russian FrameBank, and another which discussed building a corpus of Soviet poetry. I found it interesting to see corpus analyses based on the Russian language, and it was also interesting to see the use of the ‘web as corpus’. This introduced me to tools that I haven’t used before, such as the Google N-Gram Viewer.

In the afternoon, I gave a presentation entitled The collocation hypothesis: Evidence from self-paced reading. This was the first time I had ever given a conference presentation and I was really pleased to have an audience that seemed interested in my work. The audience was composed of PhD students, some undergraduate students from the Linguistics Department at HSE, researchers from other fields who had presented at the conference on the previous days, as well as a few senior academics who gave me some really useful feedback.

The conference was held at the central building of HSE and, the day before the seminar, an MA student in Computational Linguistics kindly gave me a tour of the Linguistics Department. It was interesting to see that their classes are all seminar-based and I particularly liked the way they had a common room where all members of the department, including undergraduates, postgraduates, and lecturers, go between classes in order to socialise or do work. Here, I got the chance to speak to some undergraduates and postgraduates and I was shown some of the corpora that were compiled at that department, such as the Corpus of Modern Yiddish, the Bashkir Poetic Corpus, and the Russian Learner Corpus of Academic Writing. I was also told about a project called Tolstoy Digital, which involved making a corpus of Tolstoy’s works. It was interesting to hear about the unique problems that were faced when compiling this corpus. For instance, Tolstoy used an older orthography so this had to be translated to the modern form before the corpus could be tagged and parsed.

When speaking to members of the department, it was also interesting to discuss how some of their work links to some of the work carried out at CASS and the Linguistics Department at Lancaster University. For example, Elena Semino’s work on pain questionnaires seemed to link closely to an article written by members of HSE entitled Towards a typology of pain predicates (Reznikova et al. 2012). This article discusses the way in which the semantic domain of pain is largely composed of words borrowed from other semantic domains.

After showing me around the department, the MA student, Natalia, showed me around some of the main sights in central Moscow. I really appreciated this as I got to see some of Moscow from a local’s perspective as well as getting to visit some of the key sights that I was looking forward to seeing such as the Bolshoi Theatre. Whilst in Moscow, I also went to see Swan Lake at the Kremlin Theatre of Classical Russian Ballet. This was an amazing experience because I had always wanted to see a Russian ballet and, although I had already seen Swan Lake several times, this was definitely the best version I had ever seen. Overall I had a brilliant time in Moscow and I am really grateful for the Higher School of Economics for funding and organising the trip.

New CASS Briefing now available — Language surrounding poverty in early modern England: Constructing seventeenth-century beggars and vagrants

CASSbriefings-povertyLanguage surrounding poverty in early modern England: Constructing seventeenth-century beggars and vagrants. This briefing concentrates upon attitudes towards a subset of poor people – a group who might today be termed beggars or vagrants. Seventeenth century vagrants were a marginalised group: they were overwhelmingly illiterate and politically powerless. By undertaking a study of them, we hope to improve our understanding of a people who were effectively voiceless in their own time. On a practical level, it is important to understand changing discourses on the poor because legislative change was influenced by changing public perceptions of poverty.


New resources are being added regularly to the new CASS: Briefings tab above, so check back soon.

Coming this year: Corpora and Discourse Studies (Palgrave Advances in Language and Linguistics)

Three members of CASS have contributed chapters to a new volume in the Palgrave Advances in Language and Linguistics series. Corpora and Discourse Studies will be released later this year.


corpdiscThe growing availability of large collections of language texts has expanded our horizons for language analysis, enabling the swift analysis of millions of words of data, aided by computational methods. This edited collection of chapters contains examples of such contemporary research which uses corpus linguistics to carry out discourse analysis. The book takes an inclusive view of the meaning of discourse, covering different text-types or modes of language, including discourse as both social practice and as ideology or representation. Authors examine a range of spoken, written, multimodal and electronic corpora covering themes which include health, academic writing, social class, ethnicity, gender, television narrative, news, Early Modern English and political speech. The chapters showcase the variety of qualitative and quantitative tools and methods that this new generation of discourse analysts are combining together, offering a set of compelling models for future corpus-based research in discourse.

Table of Contents:

  1. Introduction; Paul Baker and Tony McEnery
  2. E-Language: Communication in the Digital Age; Dawn Knight
  3. Beyond Monomodal Spoken Corpora: Using a Field Tracker to Analyse Participants’ Speech at the British Art Show; Svenja Adolphs, Dawn Knight and Ronald Carter
  4. Corpus-assisted Multimodal Discourse Analysis of Television and Film Narratives; Monika Bednarek
  5. Analysing Discourse Markers in Spoken Corpora: Actually as a Case Study; Karin Aijmer
  6. Discursive Constructions of the Environment in American Presidential Speeches 1960-2013: A Diachronic Corpus-assisted Study; Cinzia Bevitori
  7. Health Communication and Corpus Linguistics: Using Corpus Tools to Analyse Eating Disorder Discourse Online; Daniel Hunt and Kevin Harvey
  8. Multi-Dimensional Analysis of Academic Discourse; Jack A. Hardy
  9. Thinking About the News: Thought Presentation in Early Modern English News Writing; Brian Walker and Dan McIntyre
  10. The Use of Corpus Analysis in a Multi-perspectival Study of Creative Practice; Darryl Hocking
  11. Corpus-assisted Comparative Case Studies of Representations of the Arab World; Alan Partington
  12.  Who Benefits When Discourse Gets Democratised? Analysing a Twitter Corpus Around the British Benefits Street Debate; Paul Baker and Tony McEnery
  13. Representations of Gender and Agency in the Harry Potter Series; Sally Hunt
  14. Filtering the Flood: Semantic Tagging as a Method of Identifying Salient Discourse Topics in a Large Corpus of Hurricane Katrina Reportage; Amanda Potts

Centre Vacancy: Senior Research Associate

Linguistics & English Language
Salary: £32,277 to £37,394
Closing Date: Sunday 03 May 2015
Interview Date: To be confirmed
Reference: A1198

The Centre for Corpus Approaches to Social Science, funded by the ESRC, is seeking to appoint to an 18 month research position to work on ‘discourses on distressed communities’. This position is available from 1 May 2015 or as soon as possible thereafter.

You must have relevant research experience in corpus linguistics and an ability to engage with research within human geography.

You will pursue research on developing and applying existing and new approaches to the use of corpus linguistics within the social sciences. This will focus on discourses around the UK’s distressed communities, how these are represented in the news media, and whether areas with high levels of poverty and marginalization are represented differently from other areas. The project will centre on creating and analyzing a corpus of contemporary newspaper material. It will additionally draw on techniques being developed by the Spatial Humanities project, thus a knowledge of using geographical information systems (GIS) or analyzing census data is desirable although training can be provided.

You will join an interdisciplinary team of internationally renowned researchers within the Departments of Linguistics and English Language and History. This project is supervised by Prof Ian Gregory within the overall Centre. You will be offered excellent career progression opportunities through the ESRC Centre.

Informal enquiries may be made to Professor Ian Gregory, i.gregory(Replace this parenthesis with the @ sign)lancaster.ac.uk

Further information on the Centre for Corpus Approaches to Social Science is available from: http://cass.lancs.ac.uk. The History Department’s website can be found at: http://www.lancaster.ac.uk/fass/history and the Spatial Humanities project’s website is at: http://www.lancaster.ac.uk/spatialhum

We welcome applications from people in all diversity groups.

Further details:

Apply through the Lancaster University website. 

Three CASS articles for special issue of Discourse & Communication available Open Access now

Discourse & Communication 9(2) will be an exciting Special Issue containing a number of articles which examine corpus-based approaches to the analysis of media discourse. CASS members Tony McEnery, Paul Baker, Amanda Potts, Mark McGlashan, and Robbie Love have contributed to three of these articles, all of which are now available for Open Access early download. Read abstracts of the articles below and follow links to download full PDFs of the works. More interesting papers are also available OnlineFirst for those with subscriptions to Discourse & Communication.


Picking the right cherries? A comparison of corpus-based and qualitative analyses of news articles about masculinity 

Paul Baker (Lancaster University, UK) and Erez Levon (Queen Mary University of London, UK)

As a way of comparing qualitative and quantitative approaches to critical discourse analysis (CDA), two analysts independently examined similar datasets of newspaper articles in order to address the research question ‘How are different types of men represented in the British press?’. One analyst used a 41.5 million word corpus of articles, while the other focused on a down-sampled set of 51 articles from the same corpus. The two ensuing research reports were then critically compared in order to elicit shared and unique findings and to highlight strengths and weaknesses between the two approaches. This article concludes that an effective form of CDA would be one where different forms of researcher expertise are carried out as separate components of a larger project, then combined as a way of triangulation.


How can computer-based methods help researchers to investigate news values in large datasets? A corpus linguistic study of the construction of newsworthiness in the reporting on Hurricane Katrina

Amanda Potts (Lancaster University, UK), Monika Bednarek (University of Sydney, Australia), and Helen Caple (University of New South Wales, Australia)

This article uses a 36-million word corpus of news reporting on Hurricane Katrina in the United States to explore how computer-based methods can help researchers to investigate the construction of newsworthiness. It makes use of Bednarek and Caple’s discursive approach to the analysis of news values, and is both exploratory and evaluative in nature. One aim is to test and evaluate the integration of corpus techniques in applying discursive news values analysis (DNVA). We employ and evaluate corpus techniques that have not been tested previously in relation to the large-scale analysis of news values. These techniques include tagged lemma frequencies, collocation, key part-of-speech tags (POStags) and key semantic tags. A secondary aim is to gain insights into how a specific happening – Hurricane Katrina – was linguistically constructed as newsworthy in major American news media outlets, thus also making a contribution to ecolinguistics.


Press and social media reaction to ideologically inspired murder: The case of Lee Rigby

Tony McEnery (Lancaster University, UK), Mark McGlashan (Lancaster University, UK), and Robbie Love (Lancaster University, UK)

This article analyses reaction to the ideologically inspired murder of a soldier, Lee Rigby, in central London by two converts to Islam, Michael Adebowale and Michael Adebolajo. The focus of the analysis is upon the contrast between how the event was reacted to by the UK National Press and on social media. To explore this contrast, we undertook a corpus-assisted discourse analysis to look at three periods during the event: the initial attack, the verdict of the subsequent trial and the sentencing of the murderers. To do this, we constructed and analysed corpora of press and Twitter coverage of the attack, the conviction of the suspects and the sentencing of them. The analysis shows that social media and the press are intertwined, with the press exerting a notable influence through social media, but social media not always being led by the press. When looking at social media reaction to such an event as this, analysts should always consider the role that the press are playing in forming that discourse.