 Respeaking is a quick writing technique to transcribe, reformulate or translate an oral text while it is pronounced thanks to the ASR (automatic speech recognition) technology. The ASR technology transforms everything the respeaker dictates to the microphone into written words. This technology offers a wide range of applications: from real-time intralingual and interlingual subtitling of television, cinema and radio products to the transcription of university lectures and conferences as well as the minuting of criminal trials, town council meetings and board meetings.

 The respeaking technique is different from any other subtitling technique because it is based on the respeaker’s oral production of a mid-text (MT) which will then be transcribed by the TAL software (Automatic Language Treatment). The respeaker listens and looks at the source text (ST), reformulates it, translates it or repeats it orally, thus creating a MT1 for the machine, which, by means of suitable software, processes and transforms the vocal input into electronic subtitles (MT2) which shall then be verified, corrected if necessary and transmitted to the TV screens (TT). There is obviously a slight drawback: 2 seconds (with no correction) to 8 seconds are necessary for the respeaker to perceive, receive and understand the ST, to mentally process it so as to produce the MT1, for the human-machine interaction, for the transmission from one software to the other, for the control and, if necessary, correction of the transcription and for the actual transmission of the subtitles, thus causing a delay between the production of the ST and the display of the relevant subtitles.

 From a practical point of view, respeaking follows this process: 



 The respeaker usually works in soundproof booths and must be ready to subtitle any kind of program, with different speeds of speech and different technical terminology. If the program to be subtitled lasts longer than a working turn (from 15 to 40 minutes, depending on the respeaker’s experience and on the type of program) or is particularly complicated (discussions, parliamentary debates, etc.), two respeakers work together taking turns or ‘dividing’ the speakers. Thanks to modern technology, the respeaker has no need to be present where the program is shot and can even work from home.

 From a psycho-cognitive perspective, respeaking requires deep concentration, the ability to work under stress and the right amount of resolution not to give up in front of results which are inevitably imperfect. The respeaker has to carry out various activities at the same time:

  • continuously monitor the video to check that the delayed display of subtitles does not affect the communicative efficacy of the TT. If this is the case, special strategies, such as clarification, shall be used;
  • listen to the audio and understand a text s/he’s never heard before (in some cases the respeaker can see the program in advance, but there is no time to prepare pre-recorded subtitles) and which is not always predictable;
  • dictate to the machine a clear and recognizable MT1 resulting from the reformulation, repetition or translation of the ST, while bearing in mind that the TT shall be fully comprehensible to the audience and possibly contain no mistake;
  • check and, if necessary, correct the MT2 before transmitting it (this is not necessary if the respeaker works with an editor or if the subtitling company immediately broadcasts the subtitles the way they are recognized by the machine);
  • transmit the subtitles (not necessary if there is an editor or if the option ‘automatic transmission’ is selected).

 More in detail, the respeaker’s psycho-cognitive efforts comprise the following:

- the immediate conversion of an oral text into one which will be written: time constraints and the requests of the deaf associations advocate verbatim subtitles (for intralingual subtitling). However, the conversion from oral to written inevitably entails an adjustment of the text, for which the above listed tactics shall be taken into account to generate the desired final product. All this shall be done while listening to the original speech, monitoring the video and observing the processing limits of the machine to avoid a system overload;

- good use of the software and of the peripheral devices (microphone, earphones, keyboard in case the text shall be corrected, etc.);

- the regulation of one’s voice volume and pitch, a good pronunciation, clear and precise articulation and enunciation of words, the use of short pauses, etc., to ensure a good human-machine interaction;

- the use of easily recognizable terms and the pre-editing of mistakes which could occur due to limitations of the software (homonyms, homophones, similar and/or unknown words, etc.);

- the ability to cope with the stress and frustration due to mistakes made and/or to poor solutions found;

- the coordination with other professionals, such as the editor or other respeakers with whom one works in turns;

- in interlingual respeaking, the transfer between different languages and cultures.

 If these routine activities were accompanied by the continuous verification of the transcribed text, the post-editing correction of mistakes, the manual transmission of subtitles and the check of the final product, the psycho-cognitive stress would be huge and would increase the number of mistakes made by the respeaker. The more mistakes are made, the bigger the stress and frustration due to the poor results achieved; this affects concentration, causing even more mistakes. This is why the presence of an editor is deemed necessary.

