How to Deal with Difficult Audio and Video Transcriptions

Language services come in two major components, translation and interpreting. Each one has several sub-branches, specific services to cater to different topics. One sub-branch under translation is transcription, which is currently one of the fastest growing jobs in the United States.

Transcription converts audio files into text. Audio files can be in MP3 or WAV formats while video file formats vary, from AVI, MP4, FLV, MOV or WMV. Transcriptions serve many purposes, from medical to legal or business. It can be used as evidence in a case trial or as reference for voice recordings and translation.

In transcription work, the ideal scenario is to have recordings where the audio is clear and audible and in the same language, even if there are multiple people in the recording. But that is not always the case. There will always be instances when the recordings are complicated.

The actual transcription process is already difficult, but the level of difficulty rises higher when several languages are heard in one recording and people are speaking in a hurried and tense manner.

Complications in Transcription

A good transcriptionist is trained to expect complexities in transcription work. In transcription, time stamps are required to indicate the exact time when something is spoken. Not only the words spoken by the main speakers are transcribed. Almost everything within the recording is transcribed, including background noise, coughing, laughter and other things that are heard or spoken in the background.

Transcription covers every element that is captured in the recording, including the elements that make transcription complicated and difficult.

Background Noise

Background noise includes strong wind, other people screaming or talking, sirens or various traffic sounds. These are some of the elements that can compete with the voices of the main speakers that can make it harder for the transcription work to proceed faster. In most cases, the transcriptionist marks the audio parts with audible or inaudible if it is no longer possible to understand what is being said.

Different Languages

Speakers using other languages also make transcription difficult. In a report or interview for example, one language can be used by one person while another could be speaking in a different language and a translator is needed. If this is the case, different persons may be needed to transcribe the recording

Slang and Accent

Slang or strong accent is another element that complicates the transcribing process. Even if only one language is used, the accent of the speaker or the slang the speaker uses can pose a challenge to the transcriptionist. It would be ideal if the speaker or speakers have a neutral vocabulary and accent.

Speed of Conversation

Volume and speed of the conversation also affect the process of transcription. Since the transcriptionist has to type the transcription, the faster the speaker speaks, the slower the transcription becomes, since the transcriptionist has to rewind the recording several times to pick up all the audio. Even if the transcriber is a quick typist, it will take time to understand what’s being said and translate it into text. The volume of the speaker’s voice is also important. If it is very low and quiet, it could be hard to understand or pick up what is being said.

Role of the Transcriber

In the early days of transcription service, the transcriber had to use shorthand to write down everything that is heard in the recording, before cleaning it up and type it. Today, transcribers use computers, foot pedals and professional transcribing applications.

The audio or video file can be sent online through email or other file sharing applications. These make it easier for transcriptionists to download the file, load it into the professional software and start typing the transcription.

The transcriber will add the proper punctuation marks, new paragraphs and full stops.

A professional transcriber who uses touch-typing normally types about 75 words in one minute. Taking this as the basis, the industry standard to transcribe a 60-minute recorded video or audio is about 4 to 5 hours (minimum) of transcription work. Several factors may affect the speed of the transcription such as the speed of the conversation, the number of persons speaking in the recording and its clarity, including the clarity of the speaking voice of the speakers.

Variables Make it a Complicated Task

All the mentioned variables add time to the transcribing work. The client must understand that a professional transcriptionist is not able to type at the normal typing speed because of the need to capture all the audible sounds in the audio or video file. On the average, a speaker will speak at a speed that is four to five times faster than the typing speed of a transcriber.

How to Deal with Transcriptions

Transcription is one of the most demanding and labor-intensive among all translation services. It requires high-skills from the transcriptionist, from listening to the audio or video file, researching the subject matter, understanding the context of the recording and typing the audio into readable text.

For a professional transcriptionist, it is important to know what the clients want. Some of these include:

Typing the audio exactly as spoken, including the audible pauses such as ‘ers’ and ‘ums’ or remove them but retain the rest of the audio. Clients may want the transcription to be grammatically correct or make the non-native speakers of English to sound like one.
Remove or include the full questions when working on an interview. If the interviewee says something ”off the record,” the transcriber has to ascertain if the client wants to remove it or include the response and add an ”off the record mark.”
Capture and mark the pauses. It is also important to know how to treat the pauses made by the speaker/s.
Putting a mark on the words or sections that are not clear.
Time stamps on the document.
Identifying the speakers.
K. or U.S. spelling.
Line spacing and special font.

Style Standards

Generally, a style guide is followed by a translation company that offers transcription services. The example below illustrates how it is done.

Brackets

Brackets are often used to indicate sounds that interrupt the main dialogue. They are used to enclose a short description of the sound, which is usually descriptive, such as [applause], [laughter] or [phone ringing]. If there is a notable stop in the speaker’s sentence, a bracket is used to indicate it, such as [cut off] or three succeeding ellipses. The transcriptionist can also enclose the description of the tone of voice of the speaker in brackets, such as [angry], [happy], or [joking].

The transcriber can also use brackets followed by a time stamp to indicate uncertainty with the spoken word, for example, [crosstalk][00:00]. The speaker’s sentence should be completed before putting the other speaker’s words in another paragraph. The term ”inaudible” should be enclosed in brackets followed by a time stamp if it is not possible to understand what’s being said. This method is also applied to instances when the transcriber is unsure of the name, title or spelling by enclosing ”phonetic” in brackets and spelling the word phonetically to the best of the transcriber’s ability.

Time Stamps

Some transcriptions are required to have time stamps. Time stamps are also enclosed in brackets. They should be added every 30 seconds. It should come after the name of the speaker and before the transcribed words. Time stamps should also be added when there is a change of speaker.

Colon Usage

A colon should be used after the name of the speaker. The speaker’s name should be bolded to distinguish it from the rest of the text. If a time stamp is needed, it should be added after the colon that is placed after the speaker’s name.

Time stamps should only be indicated in minutes and seconds, so an hour and fifteen minutes is written as [75:00].

Titles

Only the name, title or gender are used as speaker labels, such as [Atty. Landon], [Manager] or [woman] in this order of hierarchy. It is important to use descriptive speaker labels to make them distinct. If there is a large group and many are talking, it is all right to label them as [audience] and refer to a person from the audience speaking as [audience member] instead of [man].

Background Sounds

If the persistent background sound does not affect the quality of the dialogue, add a note of it in the transcription at least once, at the time that it first occurred. It is all right to remove filler words or statements. Remove conjunctions such as ‘but’ or ‘and’ that starts a speaker’s sentence.

Transcription Style

Some clients need word for word transcriptions, where the verbal and nonverbal nuances are captured. When the request of is for verbatim transcriptions, the transcriber must include all audible sounds and how the words are pronounced, such as ‘cos’, ‘coz’, or ‘cuz’ instead of because, slangs, phrasings that are repeated, fillers and false starts.

Accuracy is important in verbatim transcription. Transcribers always use punctuations as is. But grammatical changes are taboo. Sounds that do not interrupt the conversation should be included in brackets every instance that they occur. A hyphen is used mid-word when a speaker stutters or repeats certain words.

The transcriptionist should have a copy of the client’s special terminology so the transcription would be accurate.

Wrapping Up

So many rules and conventions apply in audio transcription. And that’s why it is a complicated field of work. It demands more from the translator, thus it is a specialized field. Only work with a professional transcription services provider with years of experience to ensure the accuracy of your audio or video file.