Twilio Changelog | Oct. 04, 2024

<Gather> New Multi-Provider Speech Recognition Models Public Beta now available

TLDR;  New Speech Models, Multi-Provider Speech Recognition Capabilities, and Latest STT API Versions Now Supported With New <Gather> Public Beta

<Gather>, the Twilio platform’s utterance-based Speech to Text (STT) capability, takes a significant step forward for voice app builders this week, by adding support for both i) latest Speech-to-Text API capabilities from Google, updating to V2 of their Speech APIs (including new and improved speech models), as well as ii) the ability for app builders – for the first time – to be able to choose an alternative provider of Speech Recognition, Deepgram and their speech models, for use in their Twilio “<Gather> input = speech,” TwiML calls. Developers can pick and choose speech rec providers and  models on the fly as may suit their application, use case, and even change that selection with each question/prompt. or processing of each caller’s individual spoken responses.

Whereas <Gather> is the first part of Twilio’s Speech Recognition portfolio to add Deepgram and the new Google API an speech models, other parts of the speech portfolio – e.g. Streaming Real-Time Transcriptions (RTT), and batch transcriptions with Voice Intelligence – will also be able to leverage the new speech models and providers with time as well.

How can we take advantage of these new <Gather> New Multi-Provider Speech Recognition Models' Beta capabilities?

Customers wishing to check out these new speech recognition capabilities in <Gather> with their TwiML voice applications have two options for how they can start doing so:  builders with existing <Gather>-using applications can either select in the Voice Settings Twilio Console page to use Google v2 STT APIs (instead of the current Google v1 default); or builders of new or existing voice applications can specify Google (as “googlev2”) or Deepgram (as “deepgram”)  for the provider in the “provider_speechmodel” parameter of their TwiML <Gather> input = speech code.

Customer benefits 

With these new Speech Recognition capabilities, providers, and new support of their latest STT API versions, Twilio expects to deliver industry-leading speech recognition accuracy and  improved noisy environment performance, offering builders choices from across a wider array of speech models suited to builder’s use cases, for longer answers or short utterances, ranging from customer services automations like form-filling and survey responses, to speaking naturally to LLM bots in IVRs/Virtual Agents, and more!

 

Voice Voice IVR and customer care