New meth­od achieves top marks for the auto­mat­ic tran­scrip­tion of con­ver­sa­tions

 |  ResearchPress releaseDepartment of Electrical Engineering and Information TechnologyCommunications Engineering / Heinz Nixdorf Institute

Researchers at Paderborn University and RWTH Aachen University develop method for signal enhancement and labelling who spoke when

Conversations in social gatherings or important meetings in the office: automatic transcription of conversations is a challenge that has so far been inadequately solved. Although modern systems make it possible to transcribe spoken language, their recognition performance is still significantly below that of a human. Scientists from the "Communications Engineering" department at the Heinz Nixdorf Institute at Paderborn University have tackled this problem. Using innovative approaches based on the use of distant microphones, they have developed methods to make machine-readable transcriptions not only more precise but also more context-sensitive in the joint project "Automatic Transcription of Conversational Situations" with the "Machine Learning and Human Language Technology" group at the Department of Computer Science 6 at RWTH Aachen University. This enabled the researchers to develop methods for simultaneous signal enhancement and annotation – i.e. labelling or marking who spoke and when. Based on an established data set for the transcription of conversational situations, these methods achieved new best values in a global comparison. The German Research Foundation (DFG) funded the project over three years with around 300,000 euros.

The challenges of automatically transcribing conversations

Sophisticated software for the automatic transcription of conversations, such as business partner meetings or work group meetings, can replace manual transcription and make work easier. Until now, however, speech recognition has been particularly difficult in this field. "Environmental influences, such as room reverberation, have a negative impact on signal quality. In addition, it is often the case in conversational situations that people interrupt each other or that parallel conversations take place between participants. This means that the signals of several speakers overlap. However, we have managed to develop methods in which it is not necessary to know in advance how many people are speaking at the same time or how often this changes," explains Prof. Dr. Reinhold H?b-Umbach from the Institute of Electrical Engineering and Information Technology and head of the "Communications Engineering" department at the Heinz Nixdorf Institute. "It was also important for us to realise 'end-to-end' recognition in order to avoid inaccurate intermediate results. We tested our new methods to see how accurately they recognise speech, but also evaluated them in terms of the interpretability of the subcomponents and manageability," adds Dr. Ralf Schlüter from RWTH Aachen University.

Important progress for automatic transcription systems

A transcription system should be able to work with recordings of any length and correctly handle conversational situations with one or more speakers. It must be able to clearly assign the transcription of the utterances of different speakers. Current solutions consist of different modules that work independently of each other: They divide the data into similar sections, distinguish between different speakers and then recognise what has been said. "Our vision was to significantly improve these results by optimising these steps as a cohesive process rather than individually. Accordingly, our goal was to develop a coherent approach to overcome the limitations of current transcription systems – and we succeeded," says H?b-Umbach.

This text was translated automatically.

Five women and one man sit or stand together and talk to each other.
Photo (Paderborn University): The automatic transcription of discussion situations, e.g. working group meetings, can replace manual recording and make work easier.

Contact

business-card image

Prof. Dr. Reinhold H?b-Umbach

Communications Engineering / Heinz Nixdorf Institute

Head of Department of Communications Engineering

Write email +49 5251 60-3626