Corpus compilation: working paper now available

We are pleased to announce that the CASS Corpus on Urban Violence in Brazil is now ready to be analysed. It contains a total of about 5,127 articles (1,778,282 words) published between Jan-Dec 2014 by four Brazilian newspapers: Folha de São Paulo, Estado de São Paulo, Zero Hora and Pioneiro.

This working paper explains the process of compiling the corpus. It describes the selection of sources and individual texts, preparation of the texts so that they can be processed by corpus linguistics techniques, and concludes with an overview of the corpus’ content.

Big data media analysis and the representation of urban violence in Brazil: Kick-off meeting


The first meeting of the project took place earlier this month at CASS, Lancaster. This kick-off meeting brought together the Brazilian researchers Professors Heloísa Pedroso de Moraes Feltes (UCS) and Ana Cristina Pelosi (UNISC/UFC) and the CASS team (Professors Elena Semino and Tony McEnery, and Dr Carmen Dayrell) to plan the project’s activities and discuss the next steps.

The meeting was an excellent opportunity to discuss the partners’ role and activities in the project and to clarify how CASS can provide the Brazilian researchers with the expertise needed in a corpus investigation. A key decision towards this goal was to run a two-day Workshop in Corpus Linguistics in Brazil. This will be run by the CASS team (also counting with the expertise of Dr Vaclav Brezina) in the last week of May.

The workshop aims to reach a wider audience and not only to the Brazilian researchers’ team. It will be open to their colleagues, graduate and undergraduate students, and anyone interested in learning and using corpus linguistics methods and tools in the research.

We are all looking forward to that!

New CASS project: Big data media analysis and the representation of urban violence in Brazil

A new project in CASS has been funded jointly by the UK’s Economic and Social Research Council and the Brazilian research agency CONFAP. The project will involve a collaboration between two Lancaster academics (Professors Elena Semino and Tony McEnery) and two Brazilian academics: Professor Heloísa Pedroso de Moraes Feltes (University of Caxias do Sul) and Professor Ana Cristina Pelosi (University of Santa Cruz do Sul and Federal University of Ceara). The team will employ corpus methods to investigate the linguistic representation of urban violence in Brazil.

Urban violence is a major problem in Brazil: the average citizen is affected by acts of violence, more or less directly, on a daily basis. This creates a general state of fear and insecurity among the population, but, at the same time, may promote a sense of empathy with the less privileged classes in Brazil. Urban violence is also a regular topic in daily conversations and news media, so that people’s perceptions of the nature of this phenomenon are partly mediated by discourse. In particular, daily press reports of acts of violence may affect people’s views and attitudes in ways which may or may not be consistent with the actual incidence, forms and causes of violence.

This collaborative project will investigate the linguistic representation of urban violence in Brazil by applying the methods of Corpus Linguistics to two corpora:

  1. The existing transcripts of two focus groups on living with urban violence conducted in Fortaleza, Brazil, for a total of approximately 20,000 words;
  2. A new 2-million-word corpus of news reports in the Brazilian press, to be constructed as part of the partnership.

The linguistic representation of urban violence in the two corpora will be investigated by means of the analysis of: lexical and semantic concordances, collocational patterns and key words.  A comparison will also be carried out between the two corpora, in order to identify similarities and differences with respect to what types of violence are primarily talked about and how they are linguistically represented.

The comparative analysis of the two corpora will make it possible to explore in detail the relationships between official statistics about urban violence, media representations and citizens’ views. A better understanding of these relationships can help to alleviate the consequences of urban violence on citizens’ lives, and to foster attitudes conducive to the solution of the social problems that cause the violence in the first place.

Gypsies, tramps and thieves? UK national newspaper depictions of Romanians and Bulgarians analysed

British tabloid newspapers repeatedly associated Romanians – but not Bulgarians – with criminality and anti-social behavior during 2012-2013, a comprehensive new “big data” report by Oxford University’s Migration Observatory shows.

The report Bulgarians and Romanians in the British national press was undertaken by CASS Challenge Panel Member William Allen and Dora-Olivia Vicol at the Migration Observatory at Oxford University. It provides a detailed analysis of the language used by 19 British national newspapers to discuss Romanians and Bulgarians between December 1st 2012 and December 1st 2013. The analysis encompasses 4,000 articles, letters and comment pieces mentioning Romanians and/or Bulgarians, a total of more than 2.8 million words.

Key findings include:

  • Language used by tabloid newspapers to describe and discuss Romanians as a single group was frequently focused on crime and anti-social behavior (gang, criminal, beggar, thief, squatter). This was less prevalent in broadsheet newspapers.
  • Where Romanians and Bulgarians were discussed together this was consistently in the context of immigration, across both tabloid and broadsheet newspapers.
  • Verbs used to describe or discuss Romanians and Bulgarians together, across both broadsheets and tabloids were frequently related to travel (come, arrive, move, travel, head). In tabloids these included metaphors related to scale (flood, flock).
  • Words appearing before “Romanians and Bulgarians” in both tabloid and broadsheet newspapers were frequently related to prevention of movement (stop, control, block– tabloids) (deter, restrict, dissuade – broadsheets).
  • References to Romanians and Bulgarians together were frequently associated with specific numbers, across both tabloid and broadsheet newspapers. The most common specific numbers were 29 million – the approximate combined populations of Romania and Bulgaria – and 50,000 – a prediction from MigrationWatch, a pressure group which campaigns for reduced immigration, of how many A2 migrants would be added to the UK population each year for five years following the end of transitional controls.

Some language associated with stories unrelated to UK migration was also evident – particularly Romanian abattoirs implicated in the horsemeat scandal and the blonde Bulgarian Roma child who sparked an ‘abduction’ investigation in Greece.

William Allen, co-author of the report said: “The report is valuable because it provides a comprehensive account of how British national newspapers discussed Romanians and Bulgarians during a key period. The language used to describe Romanians – particularly in tabloid newspapers – often mention them alongside criminality and anti-social behaviour, while this was not the case with Bulgarians.” Read the full report here.