A Journey into Transcription, Part 2: Getting Started

training:
MASS VERB:
The action of teaching a person or animal a particular skill or type of behaviour.

So how to begin?  With experts as our guides (and thankfully no animals in sight!)…

The Context:  The first week was to be dedicated to training.  We began by watching a short video clip of a Trinity examination in progress.  Although our day-to-day work is based purely on audio recordings, we really appreciated having this quick peak into the world of the examination room.  Being able to picture the scene when listening to exam recordings somehow brings the spoken language to life.

Picture this: a desk with a friendly examiner seated at one side; tape recorder in situ and possibly a fan whirring (quietly, we hope) in the background;  a pile of papers (perhaps held down by a paperweight); and then, most importantly for us in this research into learner language, a student seated on the  other side of the desk;  some nervous, some shy, some confident, some excited, some reluctant to speak and a rare few who might even have felt quite at home seated on the other side of the desk! 

Time spent viewing this clip was truly a valuable introduction to the context of this research and the real world to which the audio transcriber is privy on a daily basis.

What next?  Enthusiastic to get started, headsets on, foot pedals down…

Practice File:  We started with a practice recording that had been transcribed previously, applying to it our first set of transcription conventions.  (These have subsequently been altered and updated  on numerous occasions.)  This was an extremely valuable process – in listening separately and together to sections of the recording and in comparing our own transcripts with each other and with the original, we quickly realised the range of subtleties that are involved in this task.  The aim, of course, is for transcribers to do as little interpretation as possible and to be able to apply the conventions in a more or less uniform manner, thus making  the transcription process as straightforward as possible.  This, after all, is what will enable us to build a reliable corpus of words that are actually uttered.  Whilst the technology now exists to generate text from spoken words, the accuracy of the text produced does not come close to that produced by a real-life human transcriber.

Key to this task is the fact that it is unlike transcription in other working environments; we are not seeking to produce grammatically correct punctuated documents such as you might find on a BBC website when you want to review that radio programme you heard, or perhaps missed.  In spoken language there are only utterances and our job is to record every utterance precisely by following the given conventions, the only punctuation in sight being apostrophes and the odd question mark.  So is that syllable a word ending, a false start to another word, perhaps a filler used intentionally to maintain a turn in conversation, or perhaps an involuntary sound? All these are natural features of spoken discourse.  Tackling this challenge and striving to produce a document that represents as accurately as is humanly possible the words actually uttered by each individual speaker – once again, here is the challenge that makes our job enjoyable and rewarding.

And finally… A Transcriber’s  Thought For The Day:

I tried to catch some fog.  I mist.

A Journey into Transcription, Part 1: Our Approach

To Transcribe:
VERB:
to put (thoughts, speech, or data) into written or printed form
origin:
mid 16th century (in the sense ‘make a copy in writing’):
from Latin transcribere, from trans- ‘across’ + scribere ‘write’

In September 2013 we applied for the post of Audio Transcriber in the CASS Office in the Department of Linguistics and English Language here at Lancaster University.  The job description seemed straightforward; to transcribe audio tape materials according to a predefined scheme and to undertake other appropriate duties as directed.  And the person specification?  As you would expect, a list of essential/desirable skills including working effectively as part of a team; the ability to learn and apply schemes (more of that later); and the ability to work with a range of accents and dialects of English (this is the fun part!).

We say the post of Audio Transcriber since, as far as we knew, only one post was available.  How wonderful to find ourselves both appointed (long may the funding last!); the opportunity to establish a slick working team, as well as to consult when problems arise and, not least, to celebrate the successes (yes, transcribing is a rewarding job!) are a huge benefit not only to ourselves in our work but also to the success of project as a whole.  In the ESRC Centre for Corpus Approaches to Social Science, it must be the corpus that is at the heart of the centre.  Knowing that we play a key role within the team working together to develop this corpus, we take great pride in what we do.  After all, our listening skills, our focus on accuracy and our meticulous attention to detail have the potential to help develop a corpus of excellent quality, and this will make a vital contribution to the validity of the all the research that will follow.  Quite simply, it is this which makes our job so enjoyable and rewarding.

Our day-to-day work involves transcribing recordings of oral examinations taken by learners of English as a second language at elementary, intermediate and advanced stages.  The examinations have been carried out by Trinity College London and have taken place in various countries; Spain, Mexico, Italy, China, India and Sri Lanka so far.  Each language and each stage have their own unique features.

Seven months and 1.5 million words later (Stage One completed and celebrated with colleagues and cake!), we were delighted to be invited to contribute a BLOG documenting our experience as transcribers.  Over the coming months we plan to describe and discuss various aspects of the job.  The aim is to offer an insight to other transcribers and researchers about this particular process.

Look out for the next instalment on Getting Started!

And finally… A Transcriber’s  Thought For The Day:

They told me I had type A blood, but it was a type-O.

Trinity Lancaster Corpus: A glimpse of the future

At Trinity we are totally impressed that our spoken learner corpus is now just over 1.5 million words. Although there are still some quality checks to run, it means we’ve reached that anticipatory moment where we can start digging into the goldmine and seeing what insights the data can offer. We’ve been working closely with CASS so that their team have been able to participate in Trinity’s test creation processes as well as examiner training sessions. This has allowed the researchers to fully understand the communicative skills the exam elicits and to identify interesting aspects of language that might be investigated. Equally, the Trinity team are very much looking forward to an upcoming visit to Lancaster where the CASS team will guide us on the corpus tools and the type of reports we can run that will access the data we need for our own research interests into the test itself.

Currently we are so excited at having such a wealth of data at our fingertips that we are in that dangerous moment of skimming the corpus to see if our assumptions are played out. We’ve all been there – when you are convinced that the corpus will finally confirm your long held beliefs about how learners use language – only to discover that you are wrong or the evidence is not there! This is, however, significantly ameliorated by emerging findings that will allow us to add a quantitative component to our test validity arguments. Mining corpus data indicates a new approach to evidencing that the test tasks are performing as anticipated – and as designed! And then there’s that little delve where the numbers and patterns indicate something unanticipated – how delicious!

This Trinity Lancaster corpus is fascinating because it comprises data from tasks where the candidate is given free rein to ‘show off’ their language skills and engage in authentic interaction with the examiner – thus giving a very close parallel with real life and so enriching applied linguistic research. At the same time, the test also contains a task type which really hones into the candidate’s skill at enacting Gricean principles of co-operation thus allowing us to investigate metacognitive processes such as how learners manage a conversation.

It has to be said that we recognize that the opportunities and insights offered by this unique corpus are in large part down to the high quality corpus transcription and annotation process implemented by CASS. We are now planning the collection of 2014 data, including we hope a wider range of L1s – because we are now totally addicted!