Microsoft at its annual Build developer conference on Monday showed off an advanced speech-to-text offering called Conversation Transaction Service that is a part of the Azure Speech Services. The new service is designed to not only recognise the voice but also correctly identify the speakers and transcribe their conversations in real time. Last year, Microsoft notably introduced a device prototype that had an array of microphones to identify attendees in meetings and enable real-time transcriptions. That hardware will be available as a reference through a Developer Device Kit. However, the new offering is touted to enable transcriptions in meetings without the use of any specific hardware.
During the keynote session at the Build 2019 conference in Seattle on Monday, Sonia Dara, Product Marketing Manager, Surface Commercial, Microsoft, demonstrated the Conversation Transcription Service offering that is currently available through Azure Speech Services. The new addition comes as an extension of the development that was showcased last year, using dedicated conical-shaped hardware. However, as we mentioned earlier, the new service is claimed to work as a device-agnostic.
Dara demonstrated how the Conversation Transcription Service enabled real-time transcription of a conversation between three speakers simply using the inbuilt microphones of a laptop and two available smartphones. The new service recognises the voice, identifies speakers, and offers transcription. Organisations can also train the language model of Azure Speech Services through Microsoft 365 to add their unique vocabulary and jargon.
Additionally, developers can pair Conversation Transcription Service with the Speech Devices SDK to optimise the experience of multi-microphone devices. The service is designed to work with multichannel audio streams and user profiles as inputs to identify speakers and generate speakers.
“The most interesting thing is when you combine speech recognition with language models that are specific to your organisation data, you can startup picking up all the jargons,” Microsoft CEO Satya Nadella said before kicking off the Conversation Transcription Service demonstration at the Build 2019 conference. “So imagine a transcript that gets created, that has the ability to understand the local jargon, that is specific to your organisation, your industry at that way making the transcript that much more useful.”
The Conversation Transcription Service is currently available as a preview. Initially, it is designed for small meetings, though Microsoft would enable customisations for large meetings at scale on request.
Disclosure: Microsoft sponsored the correspondent’s flights and hotel for the conference in Seattle, USA.