Is Academic Writing Becoming More Colloquial?

Have you noticed that academic writing in books and journals seems less formal than it used to? Preliminary data from the Written BNC2014 shows that you may be right!

Some early data from the academic journals and academic books sections of the new corpus has been analysed to find out whether academic writing has become more colloquial since the 1990s. Colloquialisation is “a tendency for features of the conversational spoken language to infiltrate and spread in the written language” (Leech, 2002: 72). The colloquialisation of language can make messages more easily understood by the general public because, whilst not everybody is familiar with the specifics of academic language, everyone is familiar with spoken language. In order to investigate the colloquialisation of academic writing, the frequencies of several linguistic features which have been associated with colloquialisation were compared in academic writing in the BNC1994 and the BNC2014.

Results show that, of the eleven features studied, five features have shown large changes in frequency between the BNC1994 and the BNC2014, pointing to the colloquialisation of academic writing. The use of first and second person pronouns, verb contractions, and negative contractions have previously been found to be strongly associated with spoken language. These features have all increased in academic language between 1994 and 2014. Passive constructions and relative pronouns have previously been found to be strongly associated with written language, and are not often used in spoken language. This analysis shows that both of these features have decreased in frequency in academic language in the BNC2014.

Figure 1: Frequency increases indicating the colloquialisation of academic language.

Figure 2: Frequency decreases indicating the colloquialisation of academic language.

These frequency changes were also compared for each genre of academic writing separately. The genres studied were: humanities & arts, social science, politics, law & education, medicine, natural science, and technology & engineering. An interesting difference between some of these genres emerged. It seems that the ‘hard’ sciences (medicine, natural science, and technology & engineering) have shown much larger changes in some of the linguistic features studied than the other genres have. For example, figure 3 shows the difference in the percentage increase of verb contractions for each genre, and clearly shows a difference between the ‘hard’ sciences and the social sciences and humanities subjects.


Figure 3: % increases in the frequency of the use of verb contractions between 1994 and 2014 for each genre of academic writing.

This may lead you to think that medicine, natural science, and technology & engineering writing has become more colloquial than the other genres, but this is in fact not the case. Looking more closely at the data shows us that these ‘hard’ science genres were actually much less colloquial than the other genres in the 1990s, and that the large change seen here is actually a symptom of all genres becoming more similar in their use of these features. In other words, some genres have not become more colloquial than others, they have simply had to change more in order for all of the genres to become more alike.

So it seems from this analysis that, in some respects at least, academic language has certainly become more colloquial since the 1990s. The following is a typical example of academic writing in the 1990s, taken from a sample of a natural sciences book in the BNC1994. It shows avoidance of using first or second person pronouns and contractions (which have increased in use in the BNC2014), and shows use of a passive construction (the use of which has decreased in the BNC2014).

Experimentally one cannot set up just this configuration because of the difficulty in imposing constant concentration boundary conditions (Section 14.3). In general, the most readily practicable experiments are ones in which an initial density distribution is set up and there is then some evolution of the configuration during the course of the experiment.

It is much more common nowadays to see examples such as the following, taken from an academic natural sciences book in the BNC2014. This example contains active sentence constructions, first person pronouns, verb contractions, negative contractions, and a question.

No doubt people might object in further ways, but in the end nearly all these replies boil down to the first one I discussed above. I’d like to return to it and ponder a somewhat more aggressive version, one that might reveal the stakes of this discussion even more clearly. Very well, someone might say. Not reproducing may make sense for most people, but my partner and I are well – educated, well – off, and capable of protecting our children from whatever happens down the road. Why shouldn’t we have children if we want to?

It will certainly be interesting to see if this trend of colloquialisation can be seen in other genres of writing in the BNC2014!


Would you like to contribute to the Written BNC2014?

We are looking for native speakers of British English to submit their student essays, emails, and Facebook and Whatsapp messages for inclusion in the corpus! To find out more, and to get involved click here. All contributors will be fully credited in the corpus documentation.

CASS go to ICAME38!

Researchers from CASS recently attended the ICAME38 conference at Charles University in Prague. Luckily, we arrived in Prague a day early which gave us plenty of time to explore the city. The weather was sunny, so we walked to Wenceslas Square, and then took the lift to the top of the Old Town Hall Tower to enjoy the views over the city.

The following day, it was time to begin the conference! Over the course of the event, seven CASS members presented their research (you can view full abstracts of all talks here). Up first was Robbie Love, presenting “FUCK in spoken British English revisited with the Spoken BNC2014”. By replicating the approaches of McEnery & Xiao (2004) on the new data contained in the Spoken BNC2014, Robbie found, among other things, that FUCK is now used equally by men and women, and that use of FUCK peaks when speakers are in their 20s and then decreases with age, apart from the 60-69 group which has a higher frequency than the 50-59 group.

Also discussing the BNC2014 project was Abi Hawtin, who presented “The British National Corpus Revisited: Developing parameters for Written BNC2014.” Abi discussed the progress on the project so far, and gave the audience a chance to look at the sampling frame which has been designed for the corpus. Abi also highlighted the difficulty of collecting certain text types, particularly published books.

Amelia Joulain-Jay presented “Describing collocation patterns in OCR data: are MI and LL reliable?” Amelia discussed the fact that data which has been digitized using OCR procedures often has low levels of accuracy, and how this can affect corpus analysis. Amelia tested the reliability of Mutual Information statistics and Log Likelihood statistics when working with OCR data, and found that, among other things, Mutual Information and Log Likelihood attract high rates of false positives. However, she also found that correcting OCR data using Overproof makes a positive difference for both statistics.

CASS director, Andrew Hardie, also presented research using OCR data. He gave a talk titled “Plotting and comparing corpus lexical growth curves as an assessment of OCR quality in historical news data”. Andrew further drew our attention to the amount of errors, or ‘noise’, in OCR data, and showed that if a graph is constructed of number of tokens observed versus count of types at intervals (say, every 10,000 tokens) a curve characteristic of lexical growth over the span of a given corpus emerges. Andrew showed that visual comparison of lexical growth curves among historical collections, or to modern corpora, therefore generates a good impression of the relative extent of OCR noise, and thus some estimate of how much such noise will impede analysis.

Also presenting was Dana Gablasova who discussed “A corpus-based approach to the expression of subjectivity in L2 spoken English: The case of ‘I + verb’ construction”. Dana used the Trinity Lancaster Corpus (TLC) to investigate the ‘I + verb’ construction in L1 Spanish and Italian speakers aged over 20 years. Dana found that with the increase in proficiency the frequency of emotive verbs decreased while the frequency of the epistemic verbs increased considerably. The study also identified the most frequent cognitive and emotive verbs and the trends in their use according to the proficiency level of L2 users.

Vaclav Brezina (and Matt Timperley, who was unfortunately not able to attend the conference) gave a software demonstration of #LancsBox – a new-generation corpus analysis tool developed at CASS. Vaclav showed that #LancsBox can:

  • Search, sort and filter examples of language use.
  • Compare frequency of words and phrases in multiple corpora and subcorpora.
  • Identify and visualise meaning associations in language (collocations).
  • Compute and visualize keywords.
  • Use a simple but powerful interface.
  • Support a number of advanced features such as customisable statistical measures.

#LancsBox can be downloaded for free from the tool website http://corpora.lancs.ac.uk/lancsbox.

Dana and Vaclav also gave a presentation together, titled “MI-score-based collocations in language learning research: A critical evaluation.” Dana and Vaclav identified several issues in the use of MI-score as a measure in language learning research, and used data from the BNC and TLC to:

  • place the MI-score in the context of other similar association measures and discuss the similarities and differences directly relevant to LLR
  • to propose general principles for selection of association measures in LLR.

Finally, former CASS senior research associate Laura Paterson, who recently moved to a lectureship at the Open University, presented “Visualising corpora using Geographical Text Analysis (GTA): (Un)employment in the UK, a case study”, which stemmed from her work on the CASS Distressed Communities project. Laura showed how GTA can be used to generate maps from concordance lines. She showed lots of interesting data visualisations and highlighted the way in which GTA allows the researcher to visualise their corpus and adds a consideration of physical space to language analysis.

Aside from all of the fascinating talks, ICAME38 also had a brilliant social programme. We were able to go on 2 boat trips along the river. The first gave us brilliant views of the city, and the second allowed us to get much closer to the bridges and buildings which line the river. The Gala dinner was also great fun – we had a linguistics themed menu and, best of all an Abba tribute band!

Thank you to all of the organisers of ICAME38 for such an enjoyable and well-organised conference!

 

Registration now open for Lancaster Summer Schools in Corpus Linguistics and other Digital Methods!

Registration now open for Lancaster Summer Schools in Corpus Linguistics and other Digital Methods!

We are pleased to announce that we will be running our hugely popular summer schools again in 2017! We will be running six free training events that cover the techniques of corpus linguistics, computational analysis of language and geographical information systems. The schools include both lectures and practical sessions that introduce the latest developments in the field and practical applications of cutting-edge analytical techniques. The summer schools are taught by leading experts in the field both from CASS and other departments and institutions (CASS Challenge Panel).

The summer schools running in 2017 are:

  • Corpus linguistics for Language studies
  • Corpus linguistics for Social Science
  • Corpus linguistics for the Humanities
  • Statistics for Corpus linguistics
  • Geographical information systems for the Digital Humanities
  • Corpus-based NLP

The summer schools will take place over 4 days (27th – 30th June 2017) and are free to attend. Click here for more information and to register.

Dealing with Optical Character Recognition errors in Victorian newspapers

CASS PhD student, Amelia Joulain-Jay, has been researching to what extent OCR errors are a problem when researching historical texts, and whether these errors can be corrected. Amelia’s work has recently been featured in a very interesting blog post on the British Library’s website – you can read the full post here.

 

Tracking terrorists who leave a technological trail.

Dr Sheryl Prentice’s work on using technology to aid in the detection of terrorists has been gaining a lot of attention in the media this week! Sheryl’s discussion of the different ways in which technology can be used to tackle the issue of terrorism and how effective these methods are was originally published in The Conversation, and then republished by the ‘i’ newspaper on 23rd June 2016. You can read the original article here.

Introducing Yufang Qian to CASS

CASS is delighted to welcome visiting researcher Yufang Qian to the centre, where she will be working on a project exploring the representation of Chinese medicine in British historical news texts over the last 200 years. Continue reading to find out more about Yufang and the research which she will be undertaking!


Yufang

In 2009, Yufang Qian obtained her PhD at Lancaster University with a dissertation on corpus-based discourse studies, under the supervision of Professors Tony McEnery and Paul Baker. She then returned to Zhejiang University of Media and Communications (ZUMC) and was appointed Professor in 2011.

Yufang is committed to popularizing the combination of corpus and discourse approaches in China. She has taught corpus linguistics and media discourse at the ZUMC to students at all levels and supervised more than 50 students’ dissertations relating to corpus-based discourse studies, in disciplines as diverse as communication studies, education, sociology, psychology and politics. The students have then either persued their further educations in the UK, USA, Japan, South Korea, and Hong Kong in this area or have used the expertise they have gained in various institutions and organizations in China.

In 2010 Yufang published ‘Corpus and Critical Discourse Analysis’ in the journal Foreign Language Teaching and Research, the first paper to introduce corpus-based discourse analysis in a Chinese journal. To date, it has been cited 48 times and downloaded 3515 times. In the past few years she has published nearly two dozens journal articles on corpus-based media discourse analysis. Her PhD thesis, Discursive constructions around terrorism in the People’s Daily and The Sun before and after 9.11 (Oxford: Peter Lang 2010), won the third Prize in the Sixth Outstanding Achievement Awards for Research in Humanities and Social Sciences, conferred by the Ministry of Education in 2013, the top governmental award in social science in China.

To explain and promote the application of the corpus-based discourse approach, Yufang has spoken at many national and international conferences and has given lectures at more than a dozen universities in China. She is Founding Director of Research Center for Discourse and Communications at the ZUMC, the first of its kind in China. She is principle investigator for many research projects, such as ‘Discursive constructions around the low carbon economy in the press of China, the UK and the US’, funded by the Ministry of Education; and ‘A corpus- based comparative study of Western and Chinese political discourse analysis’, funded by the National Social Science Foundation. She is also co-principle investigator of the project entitled ‘A comparative study of the discourse system in Chinese dream films’, funded by the National Social Science Foundation.

Yufang’s comparative perspective is evident from her early paper, ‘Contrasting signals of politeness between Western and Eastern countries’, published in Education in China (ed. E Fizette; Fenton, MI: Hana Guild, 1993). Since 2014, she has been working with CCPN Global (China in Comparative Perspective Network Global, an affiliate member of the Academy of Social Sciences, UK) to develop a project entitled ‘Corpus approaches for Chinese social science (CACSS)’. She is organizing a panel on ‘Corpus approaches to governance in the context of climate change’ at the 3rd Global China Dialogue on 2 December 2016 at the British Academy.

Yufang has recently returned to her alma mater, Lancaster University, as a visiting researcher, where she will work with Professor McEnery on a project exploring the representation of Chinese medicine in British historical news texts over the last 200 years. This diachronic observation of discourse on Chinese medicine is significant in that it will provide specific evidence of the media’s role in public health vis-à-vis the use of traditional Chinese medicine in the West. It is hoped that the findings of this study will help bridge the gap between Western and Chinese medicine, both of which play a role in serving public health.

FireAnt is making headlines!

FireAnt, a tool for extracting, visualising and exporting social media data, is making headlines! The tool, developed by Claire Hardaker and Laurence Anthony at CASS, has been noted by the Daily Mail for it’s abilities to “hunt down terrorists and trolls”. We’re delighted that FireAnt is being recognised for its capabilities in social media data analysis, and that this is being illustrated to the public in mainstream news.

You can read the article here.

You can read more about FireAnt and it’s development here and here.

News: Professor John Urry

CASS is extremely sorry to hear of the death of Professor John Urry. We have lost a very distinguished and enthusiastic member of our team, and he will be greatly missed by all at the centre. You can read more about John’s life and work here.

CASS PhD Student Awarded Gale Dissertation Research Fellowship!

CASS is delighted that the Gale Dissertation Research Fellowship for research to be undertaken in 2016 has been awarded to Amelia Joulain-Jay, a PhD student at CASS, for her work on using Geographical Information Systems and Corpus Linguistics methods to investigate how places were represented in nineteenth-century British newspapers.

The Gale Dissertation Research Fellowship is awarded byThe Research Society for Victorian Periodicals (RSVP) in support of dissertation research that makes substantial use of full-text digitized collections of 19th-century British magazines and newspapers. The Fellowship aims to support historical and literary research that deepens our understanding of the 19th-century British press in all its rich variety, and also encourages the scholarly use of collections of full-text digital facsimiles of these primary sources in aid of that research.  A prize of $1500 will be awarded, together with one year’s passworded subscription to selected digital collections from Gale, including 19th Century UK Periodicals and 19th Century British Library Newspapers.

Congratulations to CASS ‘s Professor Ram-Prasad

Congratulations to CASS ‘s Professor Ram-Prasad who has been announced as the winner of the ‘Best Book in Hindu-Christian Studies (2011-2015)’ book ‘Divine Self, Human Self: The Philosophy of Being in Two Gita Commentaries’ (Bloomsbury, 2013).  The Society for Hindu-Christian Studies will hold a panel discussion of Professor Ram-Prasad’s book at the November 2016 annual meeting in San Antonio.