Live Translation with Twilio and OpenAI’s Realtime API

October 01, 2024
Written by
Jeff Eiden
Twilion

Live Translation with Twilio and OpenAI’s Realtime API

Traditionally, multilingual customer support has required routing callers to agents who speak their language or using a third-party human translator. While this can work in some cases, it’s neither scalable nor cost-effective, leaving many businesses struggling to efficiently serve a diverse customer base.

With today’s announcement of a collaboration between Twilio and OpenAI, this could now start to change. By integrating OpenAI’s new Realtime APIbeta into Twilio’s platform, today we can show you how live voice translation in the contact center may soon be a reality.

OpenAI is rolling out Realtime API Access incrementally. Please watch their site for updates.

To make it easier to get started with live voice translation with Twilio and OpenAI, we’ve created a prebuilt sample application for you to use, available on CodeExchange: Start Building Now

In this post, we’ll give you an overview and demo of what we built, show you the architecture, and link you to the code. Let’s get started.

A contact center live translation demo

With this demo setup, two call participants can speak in different languages, with an AI-powered translator enabling real-time communication through a direct audio connection. This setup could help a contact center deliver automated multilingual support without requiring human translation.

Using OpenAI’s Realtime API addresses the latency issues of legacy AI translation systems by eliminating the need for speech-to-text (STT) and text-to-speech (TTS) conversions, which add a lag, or latency, to voice applications. This emerging capability can potentially improve customer service for a global audience, and also help reduce the operational costs of providing multilingual support in your contact centers.

Check out this demo of Twilio Flex integrated with the OpenAI Realtime API, where a humanitarian aid helpline supports a Spanish-speaking caller and an English-speaking staff member:

Live Translation Sample App

Our new CodeExchange sample app provides a working implementation of bidirectional voice language translation between a caller and a contact center agent. This starter app is meant to help you get up and running quickly while providing a foundation for your continued development and customization.

The sample app is middleware that seamlessly orchestrates Twilio’s platform with the OpenAI Realtime API to intercept audio from a caller, translate it, and play the translated audio to the contact center agent (and vice versa) in their preferred language:

Here’s how it works:

  • Caller selects a language: The caller is greeted with an Interactive Voice Response (IVR) to confirm their preferred language.
  • OpenAI prepares for translation: The middleware app initializes a connection to the OpenAI Realtime API and prompts it with instructions for bidirectional translation based on the caller’s input.
  • Connect to an agent: The conversation is queued, and once the caller is connected to an agent, the app intercepts either participant’s audio.
  • Real-time translation: The middleware forwards any audio to OpenAI Realtime, translates it, and sends the translated voice back to the other party. The conversation continues seamlessly, with both parties speaking in their preferred languages.

The result? Your customers and agents communicate naturally, no matter the language barrier.

Twilio products used in conjunction with OpenAI's Realtime API include:

  • Voice - powers connectivity to the PSTN and intercepts audio using Media Streams
  • Studio - presents the initial IVR for callers to select a preferred language
  • Flex- integrates the solution into a contact center environment to provide an agent interface, task queuing, and more.

Our open source app provides all the guidance you need to get started and settings you can customize. For instance, you can enable the FORWARD_AUDIO_BEFORE_TRANSLATION setting, which forwards the original audio to the other party while the translation is in progress. This minimizes perceived silence during the conversation, making interactions feel smoother.

 

Of course, this starter app is just the beginning. We encourage you to customize the application to fit your unique needs, whether by modifying the app’s logic, adjusting the LLM prompt, or exploring additional settings in OpenAI’s Realtime API docs. For instance, you could replace the basic IVR with Personalized Virtual Agent that can auto-detect the known customer’s preferred language based on their Unified Profile to avoid the need to ask them each time.

Start Building Today

Eager to integrate live translation into your contact center? Our sample app on CodeExchange makes it straightforward to try for yourself.

Start building now with our Code Exchange App.

The possibilities for live voice translation are vast, with applications everywhere, from retail and financial services to nonprofit organizations and public sector entities. With Twilio Flex and OpenAI Realtime, your contact center can now support customers from any language background, offering a more inclusive, efficient, and cost-effective customer experience.

As always, we can’t wait to see what you build!

Jeff Eiden, Director of Product at Twilio.