A Journey into Transcription, Part 3: Clarity

As audio transcribers we listen to sound.  Of primary importance is the clarity of the sound.



The quality of being clear (‘easy to perceive, understand, or interpret’), in particular:

  • The quality of being coherent and intelligible
  • The quality of being easy to hear; sharpness of sound
  • The quality of purity

Let’s consider these qualities and their relevance to the audio transcriber.

The quality of being coherent and intelligible

All of us, when engaged in discussion and conversation, want our language to be coherent and intelligible.  However, for the transcriber listening to a recording, its clarity in the sense of being coherent and intelligible is something of a paradox; it is simultaneously useful and yet also to be ignored.

Naturally, we know that our brains are programmed to attempt to organise and make sense of language.  In this sense, context can often present the transcriber with an invaluable clue to making out words which may be difficult to hear in a recording.

At the initial drafting stage of transcription what we hear at first can turn out to be quite different when we re-listen, edit and proofread the transcript with the glorious benefit of wider context to assist us.  Here are a few of the more entertaining examples:

you wear glasses becomes yoga classes

it’s among the becomes it’s a manga [comic]

yes she was becomes H G Wells

whisking gently becomes whiskey J&B [discussing a recipe!]

However, since the raison d’être of  this corpus is as a basis for research into the language of learners, part of the skill here is in not being distracted by our knowledge of grammatical rules and the surrounding context.

The audio transcriber’s task is to hear what the learner actually says; this may not always be what they (or we) think or expect might be logical or appropriate (or desirable!).  Indeed, the transcription conventions are designed specifically to minimise the possibility of this happening during the transcription process.  In the context of a Graded Examination in Spoken English (GESE) the students (and, on rare occasion, the examiners) can, and sometimes do, say anything!

Below are a few examples of wrong words and non-words which are to be transcribed, alongside words which may have been intended by the speaker:

words which were possibly intended

wrong words actually produced


words which were possibly intended

non-words actually produced

raise children

rise children



the child can win his father’s heart

the child can win his father’s hurt

professional and amateur

professional and amateurial

team players

team prayers


some sessed

advances in medicine

advantages in medicine











in a car crash

in a car crush



it’s a pain

it’s a paint



seize the day

size the day



Almost like catching butterflies, as audio transcribers, we are listening to capture every utterance we can possibly reach.  These will include the expected and the surprising; the rehearsed and the spontaneous; the grammatical and the ungrammatical; the complete and the interrupted; the intentional and the accidental; the confident and the hesitant; the practiced and the experimental; as well as every utterance in between.

So, clarity in the sense of being coherent and intelligible, is of mixed significance.

With regard to features of learner language, it is important to note that in this corpus we do not attempt to transcribe different accents or non-standard pronunciations.  Sometimes, benefit of the doubt is given to a speaker and correct dictionary forms are transcribed.  This may sound rather open to subjectivity; however within our team we have found that we are consistent in our decisions.  For example, when engaged in conversation about Pollution and Recycling a proficient intermediate student refers to a recycling bank and the dictionary form is transcribed, even though it did sound a little like a recycling bang (the mind boggles!).

The quality of purity (freedom from adulteration or contamination)


The quality of being easy to hear; sharpness of sound (‘penetrating’; ‘clearly heard through or above other sounds’)

This is the sense of clarity which is at the heart of the audio transcriber’s task – pure sound which is easy to hear!

With our wonderful software we are able to adjust speed without adulteration of pitch.  This tool, together with the replay function and volume adjustment enables us to re-listen as necessary and to make our very best attempt at capturing each utterance as accurately as possible.

As an example, we can consider word endings.  Researchers who are interested in learners’ use of tenses in spoken language must be able to trust the integrity of the corpus and therefore the work of the transcriber. For example, does this learner actually produce that final –ed  past tense ending or not?  Other distinctions which present challenges to the transcriber’s ears are clarity of words such as is/it’s, this/these and can/can’t.  It is the word that is produced which is important, not the word that you know they meant to say. Here, of course, making the distinction hinges on the transcriber being able to hear the utterance clearly!

Whilst we know that examination facilities are chosen especially for the purpose in each venue, there is still a wide range of barriers or sources of adulteration or contamination which can present themselves:


  • microphone at a distance from the speaker
  • microphone on wobbly surface
  • microphone dislodged as table is kicked by a nervous foot…


  • the sound of a folder full of student props being zipped or unzipped
  • the sound of papers rustling
  • the sound of furniture scraping across the floor either in the room, next door or in the corridor outside… or perhaps overhead
  • the sound of bells or buzzers ringing (the examination venue is often a school)
  • the sound of children on a playground outside (especially if windows are open for ventilation).  An anecdote: once, on hearing this sound, I found myself planning to re- listen to the recording again later when playtime was finished… only to realise that this particular playtime would never end – how delighted those children would have been if only had known they were to have an everlasting playtime!
  • the sound of traffic outside; engines ticking over patiently in traffic jams, engines being revved impatiently, horns being used
  • the sound of emergency vehicle sirens
  • the sound of monkeys chatting on the balcony outside (perhaps taking  their daily language class…?)

overlapping speech

  • whilst a natural feature of spoken discourse, this tends to be less common in the context of a formal examination. However, there are times when examiner and student speak at the same time and, whilst we do not mark the overlap, each person’s speech needs to be disentangled as much as possible in order that transcription remains true to the utterances of each speaker

environmental factors

  • in hot climates, a fan can be a necessity (although it can also be loud!)
  • likewise a window, when opened for ventilation, at the same time lets in all manner of external sounds

This is not a criticism of the venues or their facilities, nor an attempt to absolve ourselves of responsibility or a bid to excuse our mistakes.  It is just something that transcribers encounter on a daily basis and is part of the job.

One of our vital transcription conventions addresses the issue of unclear speech:

Convention:  Mark as unclear with a guess (if possible) or a time stamp:  <unclear=guess> 

When we first embarked on this huge project to transcribe millions of words for the corpus, the use of the <unclear> convention within a transcription almost felt like a ‘failure’.  Not so!  Sometimes it is simply a fact – the utterance is unclear.

How much more useful to a researcher to know this, than to base their study on an unreliable transcript, thus perverting the course of their research?  This way, researchers will know that either the <unclear> section of transcription is a transcriber’s best guess, and/or that they can use the time stamp to quickly find the exact moment in the recording to listen to the utterance themselves; thus enabling them to make a judgement for the purposes of their research.

Whilst of course we hope that we are able to hear as much as possible, it is inevitable that there will be sections of discourse which are unclear.

And finally… A Transcriber’s  Thought For The Day:  

We have been surprised to find that audio transcribing has affected our listening habits in the sense that we are accustomed to almost scanning around utterances, considering a plethora of possibilities in the hope of alighting on the most accurate version of what is being said.

One day this summer,  whilst listening to the radio on my way home, I heard a report about the Commonwealth Games in Glasgow.  Imagine my dismay when I realised that instead of the words actually produced; “the queen’s baton”  the words I had heard were “the queen spat on…”!