Big data media analysis and the representation of urban violence in Brazil

Urban violence is a major problem in Brazil: the average citizen is affected by acts of violence, more or less directly, on a daily basis. This creates a general state of fear and insecurity among the population, but, at the same time, may promote a sense of empathy with the less privileged classes in Brazil. Urban violence is also a regular topic in daily conversations and news media, so that people’s perceptions of the nature of this phenomenon are partly mediated by discourse. In particular, daily press reports of acts of violence may affect people’s views and attitudes in ways which may or may not be consistent with the actual incidence, forms and causes of violence.

This collaborative project between UK and Brazilian scholars, funded by the ESRC and CONFAP, will investigate the linguistic representation of urban violence in Brazil by applying the techniques of Corpus Linguistics to two datasets, or ‘corpora’:

  1. The transcripts of two focus groups on living with urban violence conducted in Fortaleza, Brazil, for a total of approximately 20,000 words;
  2. A 2-million-word corpus of news reports in the Brazilian press, to be constructed as part of the partnership.

The comparative analysis of the two corpora will make it possible to investigate the relationships between official statistics about urban violence, media representations and citizens’ views. A better understanding of these relationships can help to alleviate the consequences of urban violence on citizens’ lives, and to foster attitudes conducive to the solution of the social problems that cause the violence in the first place.


Principal Investigators:

  • UK: Professor Elena Semino (Lancaster University)
  • Brazil: Professor Heloísa Pedroso de Moraes Feltes (University of Caxias do Sul)


  • UK: Professor Tony McEnery (Lancaster University)
  • Brazil: Professor Ana Cristina Pelosi (University of Santa Cruz do Sul and Federal University of Ceara)

Read the latest updates on this project:

  • Corpus compilation: working paper now available (3 November 2015)

    We are pleased to announce that the CASS Corpus on Urban Violence in Brazil is now ready to be analysed. It contains a total of about 5,127 articles (1,778,282 words) published between Jan-Dec 2014 by four Brazilian newspapers: Folha de São Paulo, Estado de São Paulo, Zero Hora and Pioneiro. This working paper explains the process …

  • CASS Corpus Linguistics workshop at the University of Caxias do Sul (UCS, Brazil) (11 June 2015)

    Last month at UCS (Brazil), the CASS Corpus Linguistics workshop found a receptive audience who participated actively and enthusiastically engaged in the discussion. The workshop was run from 27-28 May by CASS members Elena Semino, Vaclav Brezina and Carmen Dayrell, and perfectly organised by the local committee Heloísa Feltes and Ana Pelosi. This workshop brought together …

  • New CASS project: Big data media analysis and the representation of urban violence in Brazil (17 December 2014)

    A new project in CASS has been funded jointly by the UK’s Economic and Social Research Council and the Brazilian research agency CONFAP. The project will involve a collaboration between two Lancaster academics (Professors Elena Semino and Tony McEnery) and two Brazilian academics: Professor Heloísa Pedroso de Moraes Feltes (University of Caxias do Sul) and …